Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

D5.6: Parallelise the relation sieving component of the Quadratic Sieve and implement a parallel version of Block-Wiedemann linear algebra over GF2 and implement large prime variants. #119

Closed
minrk opened this issue Sep 8, 2015 · 22 comments

Comments

@minrk
Copy link
Contributor

minrk commented Sep 8, 2015

Context

One of the approaches toward OpenDreamKit's aim of High Performance Mathematical Computing is to explore the addition of very fine grained parallelism (e.g. threading or SIMD) to some key computational components like Flint and MPIR. In this deliverable, we tackle two typical algorithms of computer algebra: integer factorization and block Wiedemann for finding kernel vectors of matrices over a finite field.

Whilst the threading of the quadratic sieve looks very promising, our conclusion is that the SIMD speedup that we implemented for the block Wiedemann algorithm (and the threaded experiments we did) weren't particularly useful for the quadratic sieve. We expect that only a number field sieve implementation, with matrices that are truly massive (millions of rows) would benefit from the kind of speedup we saw.

This doesn't mean that SIMD is not useful in general, it very much is, see for example D5.5 and D5.7 where SIMD proves to be a big win. SIMD just hasn't proved to be a major benefit for a project like the quadratic sieve, where only a small part of the runtime depends on the linear algebra phase of the algorithm.

The other improvements for the quadratic sieve that we describe below, such as threading the relation sieving where we get a speedup of up to 3 on 4 cores, and the various algorithmic improvements that we describe below, turned out to be much more important in practice.

Report on writing a parallel implementation of the quadratic sieve

Problem description

The Quadratic Sieve is an algorithm for factoring integers n = pq, with n typically in the 15-90 decimal digit range. For this deliverable, we have implemented a quadratic sieve with the following features:

  • Small prime variant
  • Large prime variant with in-memory hash table and on-disk pseudorelation storage
  • Multiple-polynomial self initialisation with Carrier-Wagstaff method
  • Polynomial generation using Bradford-Monagan-Percival method
  • Cache efficient sieving
  • Block-Lanczos linear algebra (adapted from msieve)
  • Knuth-Schroeppel multiplier
  • Threaded relation sieving (using OpenMP)

Results

Some partial progress towards this goal had already been completed before the OpenDreamKit project began, but the code didn't compile or run and was in an unusable state. This included a partial implementation of a single large prime variant quadratic sieve implemented by a Google Summer of Code student, based on a prior version for small integers only. This was not parallelised and a rough sketch only. It didn't compile and wasn't correct or complete.

We have completed the implementation and the code has already been merged into the Flint repository.

Our implementation is not competitive below 130 bits (~40 digits), due to the Carrier-Wagstaff/Bradford-Monagan-Percival combination. Here are some timings for larger factorisations comparing Pari/GP with our implementation in Flint on one and four cores.

In our tests, no significant improvements were observed for larger numbers of cores, so we report only on the improvements for four cores here.

n (bits) Pari Flint 4 core
130 0.11 0.24 0.28
140 0.22 0.33 0.30
150 0.43 0.55 0.39
160 0.79 0.99 0.56
170 1.8 1.4 0.83
180 3.8 3.0 1.7
190 8.6 4.8 2.4
200 17.6 10.2 6.0
210 38.4 20.9 13.5
220 67.7 33.2 15.8
230 199 78 34
240 257 140 52
250 444 248 92
260 1175 708 283
270 2199 1522 544

Future possibilities

Possible future projects could include:

  • Cache efficient handling of factor base and self-initialisation data structures
  • PPMPQS method
  • Parallelising the file handling
  • Writing an efficient self-initialising MPQS for 20-128 bits.

Testing the quadratic sieve code

The Flint repository is available here: https://github.com/wbhart/flint2

To build and test the code mentioned above, you must have GMP/MPIR installed on your machine (refer to your system documentation for how to do this). Then do:

git clone https://github.com/wbhart/flint2
cd flint
./configure --with-mpir=/path/to/mpir --with-mpfr=/path/to/mpfr --enable-openmp
export OMP_NUM_THREADS=4
make
make check MOD=qsieve

Full instructions on how to build Flint are available in the Flint documentation, available at the Flint website.

The quadratic sieve functionality is made available by including qsieve.h, and the main interface is via the function:

void qsieve_factor(fmpz_factor_t factors, const fmpz_t n)

Blog post

A blog post about the improvements to the quadratic sieve in Flint is available at https://wbhart.blogspot.de/2017/02/integer-factorisation-in-flint.html .

Report on parallel block Wiedemann

Problem description

The Block Wiedemann algorithm computes kernel vectors of a matrix over a finite field. Along with the Block Lanczos algorithm it is often used in the Quadratic Sieve and Number Field sieve over GF2, but also has other applications over a prime p.

Work done

We investigated parallelising the Block Wiedemann algorithm using the external library M4RI for basic linear algebra over GF2. This required three components to be written.

  • A sparse matrix library over GFp for Flint
  • An implementation of the Block Wiedemann algorithm over GF2
  • A naive quadratic sieve to generate realistic data

These were implemented by Anders Jensen and Alex Best and are available as modules for Flint and as an external implementation respectively.

Conclusions

Our experiments showed that it was possible to make use of the highly optimised SIMD arithmetic in the external library M4RI as a form of parallelism, to speed up the Block Wiedermann algorithm, and that this outperformed Block Wiedermann using M4RI in threaded mode, at least for the size and sparsity of problem encountered in the quadratic sieve! The result was comparable to the Block Lanczos code already used in Flint for the quadratic sieve.

The sparse matrix library was not the focus of this deliverable, but will be expanded at a later date and included in Flint. Because of the additional external dependency on M4RI, the Block Wiedemann implementation isn't eligible for inclusion in Flint, but the code is available to members of the OpenDreamKit collaboration. Future work may see the removal of this external dependency so that it can be merged into Flint. However, we currently feel that this isn't something that should be made a priority.

In fact, the quadratic sieve implementation we now have, even in parallel mode, doesn't have linear algebra as a bottleneck. This is because of the extremely small factor base sizes that are possible with the particular implementation of the quadratic sieve that we completed as part of this project.

@minrk minrk added this to the D5.6 milestone Sep 8, 2015
@nthiery nthiery modified the milestones: Month 18: 2017-02-28, D5.6 Mar 22, 2016
@ClementPernet ClementPernet assigned wbhart and unassigned ClementPernet Sep 9, 2016
@bpilorget
Copy link
Contributor

@ClementPernet (WP leader) and @wbhart (Lead beneficiary)
This deliverable is due for February 2017

@wbhart
Copy link
Contributor

wbhart commented Nov 21, 2016

The Block Wiederman is in good shape. The triple large prime variant, not
so much. I'm working on it myself, but am only able to work about 6 days a
month on it. Anders Jensen was the person we had hired to work on this, and
he left us for a permanent position.

At this stage I am certain I can get the single large prime variant working
in time. I think I can get the double large prime variant working in
time. If the triple large prime variant turns out to not be too much more
work, I can do it.

On the other hand, I have made really significant progress towards the 48
month deliverables, i.e. the polynomial multiplication, division, gcd, etc.
Unfortunately, we can't switch the deliverables around though. The
polynomial stuff certainly won't be ready by February. We are just way
ahead on that 48 month project.

In summary, I'm a little bit concerned at this point.

Bill.

On 21 November 2016 at 17:24, bpilorget notifications@github.com wrote:

@ClementPernet https://github.com/ClementPernet (WP leader) and @wbhart
https://github.com/wbhart (Lead beneficiary)
This deliverable is due for February 2017


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#119 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAOzpqTrIwMC-y5JxvwB_ZDLUcAIuzKDks5rAcWygaJpZM4F5zGd
.

@bpilorget
Copy link
Contributor

Thank you for all the information. The problem is that Month 18, February 2017, is a very strict deadline for the Commission. I let @ClementPernet and @nthiery react if they have any suggestion to help you with this.

@wbhart
Copy link
Contributor

wbhart commented Nov 21, 2016

My post was simply meant to be an up-to-date progress report. We are of
course fully aware of all the issues locally and can work through potential
solutions here.

If I were to request any changes at this stage it would simply be to change
the wording of the deliverable from triple to double large prime variant.
Even with this change, we would still be delivering a parallelised,
internationally competitive quadratic sieve.

The two circumstances that led to this were that Anders got a permanent
position (we were sad that he left, but happy that he got a permanent job!)
and secondly, preliminary work on the double large prime variant that was
supposed to be done before ODK started, never got completed.

So, to put it more concretely as a question: is it possible to change the
word "triple" to "double" in the deliverable for this project, and if we
did that, would anyone object? It certainly wouldn't be a change that would
affect any other deliverables for ODK, and would scarcely affect the
outcome, but would potentially save a lot of unnecessary panic (and work).

On 21 November 2016 at 17:47, bpilorget notifications@github.com wrote:

Thank you for all the information. The problem is that Month 18, February
2017, is a very strict deadline for the Commission. I let @ClementPernet
https://github.com/ClementPernet and @nthiery
https://github.com/nthiery react if they have any suggestion to help
you with this.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#119 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAOzpo1LTPHGToF6oZyP2GGqTWGEB5P6ks5rAcsPgaJpZM4F5zGd
.

@wbhart
Copy link
Contributor

wbhart commented Nov 23, 2016

Ping!

Just wondering if anyone has any comments about our suggestion to change
the deliverable from a triple large prime variant sieve to double large
prime variant.

I'd prefer to deliver something proven and solid than to take a risk on
something unproved (which could only feasibly improve things in the 110-135
digit range, on very large clusters anyway).

But it depends on what other ODK participants want, and whether this change
is even possible to make to the deliverable at this stage.

Bill.

On 21 November 2016 at 21:25, Bill Hart goodwillhart@googlemail.com wrote:

My post was simply meant to be an up-to-date progress report. We are of
course fully aware of all the issues locally and can work through potential
solutions here.

If I were to request any changes at this stage it would simply be to
change the wording of the deliverable from triple to double large prime
variant. Even with this change, we would still be delivering a
parallelised, internationally competitive quadratic sieve.

The two circumstances that led to this were that Anders got a permanent
position (we were sad that he left, but happy that he got a permanent job!)
and secondly, preliminary work on the double large prime variant that was
supposed to be done before ODK started, never got completed.

So, to put it more concretely as a question: is it possible to change the
word "triple" to "double" in the deliverable for this project, and if we
did that, would anyone object? It certainly wouldn't be a change that would
affect any other deliverables for ODK, and would scarcely affect the
outcome, but would potentially save a lot of unnecessary panic (and work).

On 21 November 2016 at 17:47, bpilorget notifications@github.com wrote:

Thank you for all the information. The problem is that Month 18, February
2017, is a very strict deadline for the Commission. I let @ClementPernet
https://github.com/ClementPernet and @nthiery
https://github.com/nthiery react if they have any suggestion to help
you with this.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#119 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAOzpo1LTPHGToF6oZyP2GGqTWGEB5P6ks5rAcsPgaJpZM4F5zGd
.

@nthiery
Copy link
Contributor

nthiery commented Nov 24, 2016

No strong opinion myself;so by default I will just trust your judgement. We can use the upcoming amendment to adjust the deliverable.

Opinion anyone? Especially potential users of such features?

@JohnCremona
Copy link
Contributor

JohnCremona commented Nov 24, 2016 via email

@KBelabas
Copy link
Member

KBelabas commented Nov 24, 2016 via email

@wbhart
Copy link
Contributor

wbhart commented Nov 24, 2016 via email

@ClementPernet
Copy link
Contributor

I agree with the previous comments: having a strong double large prime to deliver is good enough, and potientially an alpha version later on of the triple version even better. Keep up informed about your new strategy.

@wbhart wbhart changed the title D5.6: Parallelise the relation sieving component of the Quadratic Sieve and implement a parallel version of Block-Wiederman linear algebra over GF2 and the triple large prime variant. D5.6: Parallelise the relation sieving component of the Quadratic Sieve and implement a parallel version of Block-Wiederman linear algebra over GF2 and implement large prime variants. Dec 6, 2016
@wbhart
Copy link
Contributor

wbhart commented Dec 6, 2016

Wolfram Decker and I finally had a chance to catch up with Nicolas and we discussed the way we want to handle this deliverable.

Firstly, the deliverables D5.6, D5.7 and D5.13 have a common theme of factorisation. The polynomial arithmetic we are working towards has as an ultimate aim (beyond ODK) of doing fast multivariate factoring. The quadratic sieve is necessary for factoring integers, which has many applications in number theory.

But we've realised that the polynomial arithmetic is more important strategically than the integer factorisation. Therefore, since July I have been working flat out on the D5.13 deliverable and working towards our ultimate goal of fast multivariate factorisation in general. For example I implemented a univariate factorisation algorithm, which is some version of Karim Belabas et al.'s gradual feeding. And I've implemented single core versions of all the fast multivariate arithmetic we will need. This is already up to hundreds of times faster than what we had already, and nearly 10,000 lines of code!

As mentioned, Anders Jensen had been working on the quadratic sieve, but left us for a permanent job.

What we have decided is that at the review we'd like to highlight the really extensive work we've been doing on the polynomial arithmetic and multivariate gcd, along with the (single) large prime variant quadratic sieve that we've been working on, instead of rolling out a rushed and buggy, slow triple large prime variant sieve.

We've certainly done the Block-Wiederman implementation, we have a nearly working large prime variant sieve, which we will finish by the February due date, and we expect to complete parallelisation of the relation sieving as planned at the same time. We've also implemented an auxiliary factoring algorithm towards the double large prime variant, but not the graph theoretic parts.

To properly account for this change of emphasis from a research project on the triple large prime variant to a very practical quadratic sieve, plus the enormous amount of work we've done towards D5.13, I've sent Nicolas a pull request to amend the wording of this deliverable to simply say and "implement large prime variants", taking the emphasis off any specific version (single, double, triple, etc).

As Karim suggested, we'd like to still present an experimental version of the triple large prime variant at the end of the 48 months, but we don't think the emphasis should be on this. The triple large prime variant is a worthy research project, but not at all practical for most users. We'll instead focus on a really efficient implementation of the other large prime variants, which are practical.

None of these changes will affect any other deliverables for the ODK project. But it means we can move forward some really massive speedups for polynomial arithmetic, which we needed sooner rather than later, and still deliver a practical, working, parallelised, quadratic sieve now, with a full, practical double large prime variant quadratic sieve and a demonstration of the triple large prime variant to come later on in the project.

In particular, changing the emphasis in this way allows us to showcase what we have actually done, and prevents a massive panic that would have occurred in February. It also enables us to better meet our own strategic needs and to take the best advantage of the human resources we have available. It certainly doesn't indicate that we are behind schedule with the work that needs to be done. In fact, if anything, we have done more than we had originally hoped to do. It's a change of strategy, not a change of pace.

@wbhart
Copy link
Contributor

wbhart commented Jan 19, 2017

@KBelabas you said you were interested in timings in the < 90 digit range. Here is what I have so far. As you can see this Carrier-Wagstaff/Bradford-Monagan-Percival approach we have taken is absolute rubbish below 170 bits. But it is working nicely from about 170 bits onwards.

In future, we might use the Pari strategy for smaller factorisations. I (re)read the Pari source code the other night to remind myself how that works. In the past we've been unable to make SIMPQS robust enough for small factorisations with the polynomial strategy we were previously using, but the polynomial generation in Pari seems to be robust enough. (The worry is always running out of polynomials or having too many duplicate relations, both of which we were regularly hitting with the old simpqs I wrote years ago.)

Timings are in seconds on a 2.2GHz Opteron server (single core only), using Pari-2.9.1 and the latest development version of Flint.

n (bits)    Pari     Flint
=================
30     0.00016    0.11
40     0.00073    0.11
50       0.0012    0.11
60       0.0019    0.11
70       0.0031    0.11
80       0.0056    0.12
90       0.0082    0.13
100       0.013    0.15
110       0.022    0.17
120       0.051    0.19
130          0.11    0.24
140          0.22    0.33
150          0.43    0.55
160          0.79    0.99
170            1.8      1.4
180            3.8      3.0
190            8.6      4.8
200          17.6    10.2
210          38.4    20.9
220          67.7    33.2
230           199       78
240           257     140
250           444     248
260          1175    708
270          2199  1522
================

@KBelabas
Copy link
Member

KBelabas commented Jan 20, 2017 via email

@wbhart
Copy link
Contributor

wbhart commented Jan 20, 2017 via email

@wbhart
Copy link
Contributor

wbhart commented Jan 24, 2017

The parallel version of the sieve is working. It maxes out at about 8 threads on my machine, with a roughly 6x speedup for sufficiently large factorisations.

@wbhart wbhart changed the title D5.6: Parallelise the relation sieving component of the Quadratic Sieve and implement a parallel version of Block-Wiederman linear algebra over GF2 and implement large prime variants. D5.6: Parallelise the relation sieving component of the Quadratic Sieve and implement a parallel version of Block-Wiedemann linear algebra over GF2 and implement large prime variants. Jan 25, 2017
@nthiery
Copy link
Contributor

nthiery commented Feb 6, 2017

Dear M18 deliverable leaders,

Just a reminder that reports are due for mid-february, to buy us some time for proofreading, feedback, and final submission before February 28th. See our README for details on the process.

In practice, I'll be offline February 12-19, and the week right after will be pretty busy. Therefore, it would be helpful if a first draft could be available sometime this week, so that I can have a head start reviewing it.

Thanks in advance!

@wbhart wbhart added the FLINT label Feb 21, 2017
@nthiery
Copy link
Contributor

nthiery commented Feb 24, 2017

Note: I am now proofreading the issue description above.

@minrk minrk mentioned this issue Feb 24, 2017
@nthiery
Copy link
Contributor

nthiery commented Feb 24, 2017

Proofreading done. The issue description is fair game again. Same as for D5.5: this is looking good; I just left a little TODO.

@wbhart
Copy link
Contributor

wbhart commented Feb 27, 2017

I've dealt with the TODO and added a link to the blog post.

@wbhart
Copy link
Contributor

wbhart commented Feb 27, 2017

@nthiery All done I believe.

@nthiery
Copy link
Contributor

nthiery commented Feb 27, 2017 via email

@nthiery
Copy link
Contributor

nthiery commented Feb 27, 2017

Submitted! Thanks a lot @wbhart :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants