New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
D5.6: Parallelise the relation sieving component of the Quadratic Sieve and implement a parallel version of Block-Wiedemann linear algebra over GF2 and implement large prime variants. #119
Comments
@ClementPernet (WP leader) and @wbhart (Lead beneficiary) |
The Block Wiederman is in good shape. The triple large prime variant, not At this stage I am certain I can get the single large prime variant working On the other hand, I have made really significant progress towards the 48 In summary, I'm a little bit concerned at this point. Bill. On 21 November 2016 at 17:24, bpilorget notifications@github.com wrote:
|
Thank you for all the information. The problem is that Month 18, February 2017, is a very strict deadline for the Commission. I let @ClementPernet and @nthiery react if they have any suggestion to help you with this. |
My post was simply meant to be an up-to-date progress report. We are of If I were to request any changes at this stage it would simply be to change The two circumstances that led to this were that Anders got a permanent So, to put it more concretely as a question: is it possible to change the On 21 November 2016 at 17:47, bpilorget notifications@github.com wrote:
|
Ping! Just wondering if anyone has any comments about our suggestion to change I'd prefer to deliver something proven and solid than to take a risk on But it depends on what other ODK participants want, and whether this change Bill. On 21 November 2016 at 21:25, Bill Hart goodwillhart@googlemail.com wrote:
|
No strong opinion myself;so by default I will just trust your judgement. We can use the upcoming amendment to adjust the deliverable. Opinion anyone? Especially potential users of such features? |
On 24 November 2016 at 09:14, Nicolas M. Thiéry ***@***.***> wrote:
No strong opinion myself;so by default I will just trust your judgement.
We can use the upcoming amendment to adjust the deliverable.
Opinion anyone? Especially potential users of such features?
I think Bill's suggestion is eminently reasonable. From the outside, the
difference between "double prime" and "triple prime" looks like a small
incremental change, but of course implementing these is a lot of work and
extremely hard to estimate in advance how long it would take.
… —
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#119 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AC9N6YukKRLDMqmkwhy0ivKqafJzxTSBks5rBVVwgaJpZM4F5zGd>
.
|
* Nicolas M. Thiéry [2016-11-24 10:14]:
No strong opinion myself;so by default I will just trust your judgement. We can use the upcoming amendment to adjust the deliverable.
Opinion anyone? Especially potential users of such features?
Triple large prime variation will not matter unless users run a major
factoring effort (few months of sieving first). IMHO it's a good
research goal, but certainly not a critical feature for users.
I'd release PPMPQS as a finished product on schedule. And (then or
later) PPP as a prototype, for alpha-testing and feedback.
Cheers,
K.B.
P.S. I'm interested in integer factoring benchmarks. (Esp. in the < 90 digits
range :-)
--
Karim Belabas, IMB (UMR 5251) Tel: (+33) (0)5 40 00 26 17
Universite de Bordeaux Fax: (+33) (0)5 40 00 21 23
351, cours de la Liberation http://www.math.u-bordeaux.fr/~kbelabas/
F-33405 Talence (France) http://pari.math.u-bordeaux.fr/ [PARI/GP]
`
|
Thanks for the support everyone!
We had a brief discussion locally about this, and I've actually come up
with an even better strategy and suggested it to Wolfram. It's closer to
what Karim recommends, but is slightly more involved. Therefore I'll wait
until Wolfram has had time to follow the whole thing up.
Of course I'll update the ticket once we've decided exactly which changes
we want to request, and hopefully people can chime in if they agree that
it's a sound plan.
Bill.
…On 24 November 2016 at 22:56, Karim Belabas ***@***.***> wrote:
* Nicolas M. Thiéry [2016-11-24 10:14]:
> No strong opinion myself;so by default I will just trust your judgement.
We can use the upcoming amendment to adjust the deliverable.
>
> Opinion anyone? Especially potential users of such features?
Triple large prime variation will not matter unless users run a major
factoring effort (few months of sieving first). IMHO it's a good
research goal, but certainly not a critical feature for users.
I'd release PPMPQS as a finished product on schedule. And (then or
later) PPP as a prototype, for alpha-testing and feedback.
Cheers,
K.B.
P.S. I'm interested in integer factoring benchmarks. (Esp. in the < 90
digits
range :-)
--
Karim Belabas, IMB (UMR 5251) Tel: (+33) (0)5 40 00 26 17
Universite de Bordeaux Fax: (+33) (0)5 40 00 21 23
351, cours de la Liberation http://www.math.u-bordeaux.fr/~kbelabas/
F-33405 Talence (France) http://pari.math.u-bordeaux.fr/ [PARI/GP]
`
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#119 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOzpob_40bLuCWO-twQNzgXVF9cIJM5ks5rBgf5gaJpZM4F5zGd>
.
|
I agree with the previous comments: having a strong double large prime to deliver is good enough, and potientially an alpha version later on of the triple version even better. Keep up informed about your new strategy. |
Wolfram Decker and I finally had a chance to catch up with Nicolas and we discussed the way we want to handle this deliverable. Firstly, the deliverables D5.6, D5.7 and D5.13 have a common theme of factorisation. The polynomial arithmetic we are working towards has as an ultimate aim (beyond ODK) of doing fast multivariate factoring. The quadratic sieve is necessary for factoring integers, which has many applications in number theory. But we've realised that the polynomial arithmetic is more important strategically than the integer factorisation. Therefore, since July I have been working flat out on the D5.13 deliverable and working towards our ultimate goal of fast multivariate factorisation in general. For example I implemented a univariate factorisation algorithm, which is some version of Karim Belabas et al.'s gradual feeding. And I've implemented single core versions of all the fast multivariate arithmetic we will need. This is already up to hundreds of times faster than what we had already, and nearly 10,000 lines of code! As mentioned, Anders Jensen had been working on the quadratic sieve, but left us for a permanent job. What we have decided is that at the review we'd like to highlight the really extensive work we've been doing on the polynomial arithmetic and multivariate gcd, along with the (single) large prime variant quadratic sieve that we've been working on, instead of rolling out a rushed and buggy, slow triple large prime variant sieve. We've certainly done the Block-Wiederman implementation, we have a nearly working large prime variant sieve, which we will finish by the February due date, and we expect to complete parallelisation of the relation sieving as planned at the same time. We've also implemented an auxiliary factoring algorithm towards the double large prime variant, but not the graph theoretic parts. To properly account for this change of emphasis from a research project on the triple large prime variant to a very practical quadratic sieve, plus the enormous amount of work we've done towards D5.13, I've sent Nicolas a pull request to amend the wording of this deliverable to simply say and "implement large prime variants", taking the emphasis off any specific version (single, double, triple, etc). As Karim suggested, we'd like to still present an experimental version of the triple large prime variant at the end of the 48 months, but we don't think the emphasis should be on this. The triple large prime variant is a worthy research project, but not at all practical for most users. We'll instead focus on a really efficient implementation of the other large prime variants, which are practical. None of these changes will affect any other deliverables for the ODK project. But it means we can move forward some really massive speedups for polynomial arithmetic, which we needed sooner rather than later, and still deliver a practical, working, parallelised, quadratic sieve now, with a full, practical double large prime variant quadratic sieve and a demonstration of the triple large prime variant to come later on in the project. In particular, changing the emphasis in this way allows us to showcase what we have actually done, and prevents a massive panic that would have occurred in February. It also enables us to better meet our own strategic needs and to take the best advantage of the human resources we have available. It certainly doesn't indicate that we are behind schedule with the work that needs to be done. In fact, if anything, we have done more than we had originally hoped to do. It's a change of strategy, not a change of pace. |
@KBelabas you said you were interested in timings in the < 90 digit range. Here is what I have so far. As you can see this Carrier-Wagstaff/Bradford-Monagan-Percival approach we have taken is absolute rubbish below 170 bits. But it is working nicely from about 170 bits onwards. In future, we might use the Pari strategy for smaller factorisations. I (re)read the Pari source code the other night to remind myself how that works. In the past we've been unable to make SIMPQS robust enough for small factorisations with the polynomial strategy we were previously using, but the polynomial generation in Pari seems to be robust enough. (The worry is always running out of polynomials or having too many duplicate relations, both of which we were regularly hitting with the old simpqs I wrote years ago.) Timings are in seconds on a 2.2GHz Opteron server (single core only), using Pari-2.9.1 and the latest development version of Flint.
|
* wbhart [2017-01-19 18:29]:
@KBelabas you said you were interested in timings in the < 90 digit range. Here is what I have so far. As you can see this Carrier-Wagstaff/Bradford-Monagan-Percival approach we have taken absolute rubbish below 170 bits. But it is working nicely from about 170 bits onwards.
In future, we might use the Pari strategy for smaller factorisations. I (re)read the Pari source code the other night to remind myself how that works. In the past we've been unable to make SIMPQS robust enough, but it seems to be pretty robust in Pari.
n (bits) Pari Flint
=================
30 0.00016 0.11
[...]
270 2199 1522
Interesting, thanks! Can you send me the benchmark code used for Pari ?
Cheers,
K.B.
--
Karim Belabas, IMB (UMR 5251) Tel: (+33) (0)5 40 00 26 17
Universite de Bordeaux Fax: (+33) (0)5 40 00 21 23
351, cours de la Liberation http://www.math.u-bordeaux.fr/~kbelabas/
F-33405 Talence (France) http://pari.math.u-bordeaux.fr/ [PARI/GP]
`
|
It was very primitive @KBelabas. I just did something like the following in
GP:
f(b,k)={for(i=1,k, s =
nextprime(random(2^(b/2)))*nextprime(random(2^(b/2))); factorint(s, 14))}
#
f(30,100000)
for 30 bits, and so on. Of course I divided by the number of iterations to
get the average time.
The number of iterations varied between the two systems, especially where
Flint was so much slower. But for Pari I generally tried to keep the total
time below about 60s, though in no case did I use less than 5 iterations
below 230 bits.
The numbers I factored from 230 bits onward are:
230: 1033005553733597551486054475473301536956276424884162978347314111210331
240: 694421043016559099938956613657502234908133400914093615344502918125299503
250: 846521313169799911792277779197149004149926546076302314337091688511275034397
260: 1068467576946595043900454053298713863804327606923444414074010450337936633096519
270: 618784014304300826386461364940832504316398894596588623706570810918031692462246587
The numbers you end up with, the way I'm generating them, are probably all
just slightly smaller than the stated number of bits. But both systems were
factoring the same numbers from 230 bits onward.
Bill.
…On 20 January 2017 at 01:28, Karim Belabas ***@***.***> wrote:
* wbhart [2017-01-19 18:29]:
> @KBelabas you said you were interested in timings in the < 90 digit
range. Here is what I have so far. As you can see this
Carrier-Wagstaff/Bradford-Monagan-Percival approach we have taken
absolute rubbish below 170 bits. But it is working nicely from about 170
bits onwards.
>
> In future, we might use the Pari strategy for smaller factorisations. I
(re)read the Pari source code the other night to remind myself how that
works. In the past we've been unable to make SIMPQS robust enough, but it
seems to be pretty robust in Pari.
>
> n (bits) Pari Flint
> =================
> 30 0.00016 0.11
[...]
> 270 2199 1522
Interesting, thanks! Can you send me the benchmark code used for Pari ?
Cheers,
K.B.
--
Karim Belabas, IMB (UMR 5251) Tel: (+33) (0)5 40 00 26 17
<+33%205%2040%2000%2026%2017>
Universite de Bordeaux Fax: (+33) (0)5 40 00 21 23
<+33%205%2040%2000%2021%2023>
351, cours de la Liberation http://www.math.u-bordeaux.fr/~kbelabas/
F-33405 Talence (France) http://pari.math.u-bordeaux.fr/ [PARI/GP]
`
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#119 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAOzpo9HAEL3dNNSrrQ8fhOt7rtDmptbks5rT_-1gaJpZM4F5zGd>
.
|
The parallel version of the sieve is working. It maxes out at about 8 threads on my machine, with a roughly 6x speedup for sufficiently large factorisations. |
Dear M18 deliverable leaders, Just a reminder that reports are due for mid-february, to buy us some time for proofreading, feedback, and final submission before February 28th. See our README for details on the process. In practice, I'll be offline February 12-19, and the week right after will be pretty busy. Therefore, it would be helpful if a first draft could be available sometime this week, so that I can have a head start reviewing it. Thanks in advance! |
Note: I am now proofreading the issue description above. |
Proofreading done. The issue description is fair game again. Same as for D5.5: this is looking good; I just left a little TODO. |
I've dealt with the TODO and added a link to the blog post. |
@nthiery All done I believe. |
***@***.*** All done I believe.
Excellent, thanks! Submission planned for after lunch :-)
|
Submitted! Thanks a lot @wbhart :-) |
Context
One of the approaches toward OpenDreamKit's aim of High Performance Mathematical Computing is to explore the addition of very fine grained parallelism (e.g. threading or SIMD) to some key computational components like Flint and MPIR. In this deliverable, we tackle two typical algorithms of computer algebra: integer factorization and block Wiedemann for finding kernel vectors of matrices over a finite field.
Whilst the threading of the quadratic sieve looks very promising, our conclusion is that the SIMD speedup that we implemented for the block Wiedemann algorithm (and the threaded experiments we did) weren't particularly useful for the quadratic sieve. We expect that only a number field sieve implementation, with matrices that are truly massive (millions of rows) would benefit from the kind of speedup we saw.
This doesn't mean that SIMD is not useful in general, it very much is, see for example D5.5 and D5.7 where SIMD proves to be a big win. SIMD just hasn't proved to be a major benefit for a project like the quadratic sieve, where only a small part of the runtime depends on the linear algebra phase of the algorithm.
The other improvements for the quadratic sieve that we describe below, such as threading the relation sieving where we get a speedup of up to 3 on 4 cores, and the various algorithmic improvements that we describe below, turned out to be much more important in practice.
Report on writing a parallel implementation of the quadratic sieve
Problem description
The Quadratic Sieve is an algorithm for factoring integers n = pq, with n typically in the 15-90 decimal digit range. For this deliverable, we have implemented a quadratic sieve with the following features:
Results
Some partial progress towards this goal had already been completed before the OpenDreamKit project began, but the code didn't compile or run and was in an unusable state. This included a partial implementation of a single large prime variant quadratic sieve implemented by a Google Summer of Code student, based on a prior version for small integers only. This was not parallelised and a rough sketch only. It didn't compile and wasn't correct or complete.
We have completed the implementation and the code has already been merged into the
Flint
repository.Our implementation is not competitive below 130 bits (~40 digits), due to the Carrier-Wagstaff/Bradford-Monagan-Percival combination. Here are some timings for larger factorisations comparing
Pari/GP
with our implementation inFlint
on one and four cores.In our tests, no significant improvements were observed for larger numbers of cores, so we report only on the improvements for four cores here.
Future possibilities
Possible future projects could include:
Testing the quadratic sieve code
The
Flint
repository is available here: https://github.com/wbhart/flint2To build and test the code mentioned above, you must have
GMP/MPIR
installed on your machine (refer to your system documentation for how to do this). Then do:Full instructions on how to build
Flint
are available in theFlint
documentation, available at theFlint
website.The quadratic sieve functionality is made available by including qsieve.h, and the main interface is via the function:
Blog post
A blog post about the improvements to the quadratic sieve in Flint is available at https://wbhart.blogspot.de/2017/02/integer-factorisation-in-flint.html .
Report on parallel block Wiedemann
Problem description
The Block Wiedemann algorithm computes kernel vectors of a matrix over a finite field. Along with the Block Lanczos algorithm it is often used in the Quadratic Sieve and Number Field sieve over GF2, but also has other applications over a prime p.
Work done
We investigated parallelising the Block Wiedemann algorithm using the external library
M4RI
for basic linear algebra over GF2. This required three components to be written.Flint
These were implemented by Anders Jensen and Alex Best and are available as modules for Flint and as an external implementation respectively.
Conclusions
Our experiments showed that it was possible to make use of the highly optimised SIMD arithmetic in the external library
M4RI
as a form of parallelism, to speed up the Block Wiedermann algorithm, and that this outperformed Block Wiedermann usingM4RI
in threaded mode, at least for the size and sparsity of problem encountered in the quadratic sieve! The result was comparable to the Block Lanczos code already used in Flint for the quadratic sieve.The sparse matrix library was not the focus of this deliverable, but will be expanded at a later date and included in Flint. Because of the additional external dependency on
M4RI
, the Block Wiedemann implementation isn't eligible for inclusion in Flint, but the code is available to members of the OpenDreamKit collaboration. Future work may see the removal of this external dependency so that it can be merged into Flint. However, we currently feel that this isn't something that should be made a priority.In fact, the quadratic sieve implementation we now have, even in parallel mode, doesn't have linear algebra as a bottleneck. This is because of the extremely small factor base sizes that are possible with the particular implementation of the quadratic sieve that we completed as part of this project.
The text was updated successfully, but these errors were encountered: