OpenDreamKit / OpenDreamKit Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
D3.11: HPC enabled Sage distribution #60
Comments
Suggestion: please prepare a demo in notebook format; see #289. |
@ClementPernet , can I have your input on this? |
@defeo : I plan to contribute a lagre part of the deliverable by the work in https://trac.sagemath.org/ticket/27444 and related releases https://trac.sagemath.org/ticket/26932 |
@ClementPernet , can I have your input on this?
Good. I assume there won't be time to expose in Sage the other parallelizations achieved in Pari or Singular?
And is this going to be feasible? I see that ticket is not in needs-review yet.
Yes. I'm in Cernay, available at any time. |
What exactly would that mean? For parallel algorithms inside PARI itself, there is no need to expose anything: it should "just work" (modulo bugs like https://trac.sagemath.org/ticket/26608) |
Sorry about that, I forgot to put it in need-review once its dependency got merged. I just did it.
|
@ClementPernet, so, as Jeroen said, Pari MT could be used in Sage now, but it is disabled in the distribution because of this: https://trac.sagemath.org/ticket/26608. It should be possible to compile a Sage version with MT enabled, if we don't care about the docs, Jeroen is going to try this now. |
Following up on @embray's comment I'll try to summarize the discussion we had at lunch. Pari has a few global variables (pointers to the stack, etc.), that were turned into thread-locals to allow multi-threading. This works well when Pari is the only piece of code responsible for creating threads, however it breaks badly when outside code creates threads and then calls Pari (which is happening in the docbuild), because thread locals cannot be initialized then. Said otherwise, Pari is not thread-safe from above. A. Rojas confirms that currently this is only breaking the docbuild, because Sage does not use threads+Pari anywhere else, but it's obviously not acceptable to have Sage segfault whenever a user wants to write multithreaded Python code that eventually calls Pari. Similar problems have also happened with BLAS in the past, and likely affect NTL, HPC-GAP and other Sage dependencies. There seems to be no easy fix. In the case of Pari, some hacks to Cypari may help limit the damage, but a complete solution would need to rethink Pari's memory management. In general, the only multi-threaded code that's easy to link is that which has no side-effects. @nthiery was encouraging to write about these barriers in the deliverable. He was also asking how other projects solve this kind of multi-threading issues, however Sage seems to be very special in this respect, in that it is willing to depend on multi-threaded code without imposing any constraints on its dependencies. @jdemeyer, @stevelinton, @embray, please feel free to correct me and add your thoughts. |
I agree, I don't think that there any many "other projects" like this. There is numpy, which uses BLAS but BLAS seems to be thread-safe in this way so it doesn't have this problem. |
fflas-ffpack plays a similar role in SageMath, and since we designed it to be thread safe, we can safely run it through Sage (this is https://trac.sagemath.org/ticket/27444, still waiting for a review, if possible before friday's deadline!) |
And maybe suggest some best practices for library writers to make
their code «thread-safe from above» ?
|
I've pushed a draft outline for the report. I'm currently writing the section on parallel linear algebra.
@jdemeyer, @stevelinton, @embray, @defeo : could you write something in the number theory section about the problems which lead to not being able to expose PARI-MT in SageMath ? |
I could do it, but is it really necessary? IMO we could go ahead and use multi-threaded PARI in Sage, just with some possible caveats mentioned in the documentation (an perhaps plans to improve the situation in the future, particularly from within cypari2). |
Then I guess, the paragraph should be a mix of both: some experiments showcasing how it can be transparently used, and then a discussion explaining these caveats and the perspectives with cypari2. |
With a caveat: the MT speed-ups are in Pari 2.12, but Sage is still at 2.11.1, so activating Pari MT will not produce anything impressive at the moment. |
Sure, but there's nothing stopping one from using Sage with a newer version of Pari, unless there is some major blocker. Has no one tried? Does it matter? |
In fact a (slightly) less hackish way is to set the (PARI) global variable pari_mt_nbthreads to 1 The main problem is to make sure of doing "the right thing" in case of interrupts, signals, exceptions, etc. So as to reset pari_mt_nbthreads to its previous value when PARI functions returns to SAGE, however control is restored to SAGE toplevel. |
Ah, so there is presumably a way to configure at runtime whether to use multithreading or not. Cool. Thanks Karim! |
Reimer is now a coauthor. I updated the abstract (see first comment on this issue) to reflect the new content. I fixed the sectioning (GAP,... were in the LinBox section). I went through the updated GAP/Singular/PARI sections without spotting typos. |
Ah, I just realised some of Karim's comments to me were about the intro to the report, not the GitHub ticket description. So I need to apply some typo corrections to that section if we haven't submitted. |
Ah I see why. It's not in the document I was editing. I'll try to figure out where it is and edit it there. |
Ah I see why. It's not in the document I was editing. I'll try to figure out where it is and edit it there.
It's from the first comment in the issue:
#60
Let me know when it's updated, and I'll run an update (you could also
do it; see the main README, but at this stage don't bother).
It's getting late. I'll probably submit tomorrow morning anyway to
give a bit of slack and avoid doing some silly mistake upon submitting.
|
Ok, I fixed the intro, hopefully. A final read through by someone would be nice. |
Oh, there is one more thing. It refers to Task ... , but there's a number missing for the task. |
I just fixed another two obvious typos in the main report. There are some less obvious ones, but I think it's good enough given what time it is. |
* Nicolas M. Thiéry [2019-09-04 00:16]:
Ok @wbhart , will do tomorrow early morning.
@KBelabas: out of curiosity: does setting nthread=1 revert to plain
sequential code, or just run the parallel code with just one thread?
There's usually no "plain sequential code" anymore, it's the same code base:
we run the parallel code in one thread (the main thread, obviously).
There are a few exceptions where we have two completely different algorithms
and the one which is easy to parallelize becomes far superior only when a
certain number of threads are available.
E.g. determinant of an integer matrix: Chinese remainders [inherently
parallel] vs random linear system + Hensel lifting [one part of which is
inherently sequential]; CRT is better if when you have more than one or two
threads but too slow otherwise.
Cheers,
K.B.
--
Karim Belabas, IMB (UMR 5251) Tel: (+33) (0)5 40 00 26 17
Universite de Bordeaux Fax: (+33) (0)5 40 00 21 23
351, cours de la Liberation http://www.math.u-bordeaux.fr/~kbelabas/
F-33405 Talence (France) http://pari.math.u-bordeaux.fr/ [PARI/GP]
`
|
Thanks all and in particular @wbhart for this final push on the report. As I mentionned to @nthiery , I could not spend more time on it early this week as I had to deal with classes and everything I put on hold the whole previous week.
I'm currently proofreading the report which looks much better. I'll post as soon as I'm done with the proofreading. |
Seconding the thank you to @wbhart on this. I also did not fully realize that August 31 was not a hard deadline, but he enumerated the current difficulties in far more detail and completeness than I could have. I would also second that, with respect to producing an "HPC Sage" I think it suffices to say that ODK has contributed to tackling different approaches to parallelism in different system, with "Sage" in this case being a kind of acid test for how all those different approaches work when thrown together in the same arena (conclusion: not well; further collaboration careful design is needed). |
Done proof-reading. I pushed a typo fix and a clarification in LInBox section. |
@KBelabas Re:
I'm not sure it's that simple as the problem with MT PARI is not to do with PARI's multi-threaded code itself (e.g. I can and have reproduced the problem even by forcibly hard-coding It doesn't matter because when you build PARI with When PARI starts its own threads--at the beginning of their start up routines ( Something @jdemeyer and I discussed but have not to my knowledge tried yet, is that at least at the Python level (i.e. in cypari2) we could check whether or not we're running on the process's main thread, and also check (before any call is made, really) whether one of these critical TLS variables have been initialized (e.g. avma) and if not, it could have the level of awareness about PARI to
That would probably work as a first-pass fix, but it's not clear to me (not knowing PARI well-enough) if that would be a fully stable solution. For starters, obviously, although it means one can technically try to use PARI outside the main thread it won't be thread-safe without putting some kind of "global interpreter lock" around anything which uses the PARI interpreter or manipulates PARI objects (perhaps it would be better for now, in cypari, if it just disallowed use off the main thread, but at least threw a Python-level exception instead of segfaulting as it does currently). And then, in a more arbitrarily complex setting one runs afoul of other problems @wbhart mentioned, such as complex recursive data structures that might happen to have a PARI object (either from its heap, or worse yet its stack) floating around. This could be considered a mistake, perhaps, in a multi-threaded environment. But there is probably existing code that is not concerned about this, in which hard-to-trace bugs are likely to appear. None of this is a criticism or anything--it's just the reality of how quickly complicated and bothersome this problem can become. |
Hmm, I wonder if, at least for plain Python threads, it wouldn't suffice to just let the actual Python GIL also protect the PARI interpreter. That doesn't solve anything when it gets into arbitrarily complicated multi-level, multi-system parallelism. But at least for the simple case of Python + PARI it might be sufficient. |
Since cypari2 requires the GIL (it's using Python after all), that's already de-facto the case. |
I think this could work, but it would be helpful to have some official PARI API for this. |
Thank you everyone for contributing to this report! Beautiful collective intelligence at work. A bit late, but it was well worth it. Let's keep the discussion moving; and maybe we should have a joint workshop at some point. |
* E. M. Bray [2019-09-04 10:42]:
@KBelabas Re:
> about PARI not being tread-safe from above, it seems to me that a simple solution is to disable PARI's MT mechanism when called from a SAGE-level thread.
I'm not sure it's that simple as the problem with MT PARI is not to do with PARI's multi-threaded code itself (e.g. I can and have reproduced the problem even by forcibly hard-coding `pari_mt_nbthreads = 1` effectively disabling multi-threading in PARI.
It doesn't matter because when you build PARI with `--enable-tls` or `--mt=pthread` (which implies the former), it forces numerous global variables to use thread-local storage (e.g. `pari_mainstack` and `avma`). So if you start a thread in Python (which uses pthreads under the hood on POSIX systems), those TLS variables are uninitialized in the new thread, so as soon as you try to do *anything* with PARI--
[...]
OK, thanks for the detailed explanation, it's starting to make sense. A few
things I don't understand [ knowing essentially nothing about how SAGE
launches PARI behind the hood ] :
1) what happens with standard PARI ? I'd expect things to crash in an even
worse way in any pthread application, since PARI is not reentrant at all without
--enable-tls; so using the TLS version should only improve things, not break
them. Can you give us a specific (not too hard to reproduce) example
which works just fine with regular PARI but breaks with --enable-tls ?
2) Debian ships with both sagemath and PARI; it seems that configuring with
--enable-tls there doesn't harm sagemath. Here's the builddlog in that case:
https://buildd.debian.org/status/fetch.php?pkg=sagemath&arch=amd64&ver=8.6-6&stamp=1551504315&raw=0
3) I would indeed expect using the low-level pari_thread_init /
pari_thread_start to work (at least with pari_mt_nbthreads = 1). All
is described in Appendix D (together with an example in 'master' at least,
not sure whether it was already there in 2.12.0).
Cheers,
K.B.
--
Karim Belabas, IMB (UMR 5251) Tel: (+33) (0)5 40 00 26 17
Universite de Bordeaux Fax: (+33) (0)5 40 00 21 23
351, cours de la Liberation http://www.math.u-bordeaux.fr/~kbelabas/
F-33405 Talence (France) http://pari.math.u-bordeaux.fr/ [PARI/GP]
`
|
I just finished proof-reading (don't tell my new employer :p ), and pushed a few typographic improvements. I have two minor questions @ClementPernet:
Thanks to everyone for this nice and very instructive report! |
I prefer to let it this way as computing the square somehow seem to imply that a dedicated algorithm for squaring is being used, which is not the case here.
Done. |
For the most part this has nothing to do with Sage even, it just happens to occur there being that Sage is one of the only applications (that I know of) that's using PARI as a third-party library. I believe it's not the only one though but I forget. Jeroen would know.
I'm not exactly sure what you're asking for here--an example of what? If it's a question of multi-threaded code that uses PARI, no, I don't have such an example (and as you say it would not work anyways). I agree, of course, that for PARI's multi-threaded operations the TLS variables are necessary :) I can give one specific example involving multi-level parallelism in a sort of unfortunate way, where code that otherwise works would break. I'll get to that below since it's relevant to your other questions...
It does not, to my knowledge, impact building Sage or running the tests. The only place where it does have a visible impact currently (and this could easily change) is in building the Sage documentation. Here it was known by the debian-science community to break. I forget who narrowed the problem down to PARI with The reason for that issue is discussed originally here, and is annoyingly technical, but to summarize: The issue occurred due to a specific design choice of a construct in Python called In order for all this to work, This specific case could be alleviated somewhat if we used |
That's easy: have an application with 2 threads A and B. Call |
The primary use of computational mathematics software is to perform experimental mathematics, for example for testing a conjecture on as many as possible instances of size as large as possible. In this perspective, users seek for computational efficiency and the ability to harness the power of a variety of modern architectures. This is particularly relevant in the context of the OpenDreamKit Virtual Research Environment toolkit which is meant to reduce entry barriers by providing a uniform user experience from multicore personal computers -- a most common use case -- to high-end servers or even clusters. Hence, in the realm of this project, we use the term High Performance Computing (HPC) in a broad sense, covering all the above architectures with appropriate parallel paradigms (SIMD, multiprocessing, distributed computing, etc).
Work Package 5 has resulted in either enabling or drastically enhancing the high performance capabilities of several computational mathematics systems, namely
Singular
(D5.13 #111),GAP
(D5.15 #113) andPARI
(D5.16, #114), or of the dedicated libraryLinBox
(D5.12 #110, D5.14 #112).Bringing further HPC to a general purpose computational mathematics system such as
SageMath
is particularly challenging; indeed, they need to cover a broad -- if not exhaustive -- range of features, in a high level and user friendly environment, with competitive performance. To achieve this, they are composed from the ground up as integrated systems that take advantage of existing highly tuned dedicated libraries or sub-systems such as aforementioned.Were report here on the exploratory work carried out in Task 3.5 (#54) to expose HPC capabilities of components to the end-user level of an integrated system such as
SageMath
.Our first test bed is the
LinBox
library. Its multicore parallelism features have been successfully integrated inSage
, with a simple API letting the user control the desired level of parallelism. We demonstrate the efficiency of the composition with experiments. Going beyond expectations, the outcome has been integrated in the next production release ofSageMath
, hence immediately benefiting thousands of users.We proceed by detailing the unique challenges posed by each of the
Singular
,PARI
, andGAP
systems. The common cause is that they were created decades ago as standalone systems, represent hundreds of man-years of development, and were only recently redesigned to be usable as a parallel libraries. Some successes were nevertheless obtained in experimental setups and pathways to production are discussed.We conclude with lessons learned at the occasion of this work and through expertise sharing within and beyond OpenDreamKit: levels of integration one may wish for when composing parallel computational mathematics software. and challenges such integration would raise.
The text was updated successfully, but these errors were encountered: