Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

D3.11: HPC enabled Sage distribution #60

Open
minrk opened this issue Sep 8, 2015 · 98 comments
Open

D3.11: HPC enabled Sage distribution #60

minrk opened this issue Sep 8, 2015 · 98 comments

Comments

@minrk
Copy link
Contributor

@minrk minrk commented Sep 8, 2015

The primary use of computational mathematics software is to perform experimental mathematics, for example for testing a conjecture on as many as possible instances of size as large as possible. In this perspective, users seek for computational efficiency and the ability to harness the power of a variety of modern architectures. This is particularly relevant in the context of the OpenDreamKit Virtual Research Environment toolkit which is meant to reduce entry barriers by providing a uniform user experience from multicore personal computers -- a most common use case -- to high-end servers or even clusters. Hence, in the realm of this project, we use the term High Performance Computing (HPC) in a broad sense, covering all the above architectures with appropriate parallel paradigms (SIMD, multiprocessing, distributed computing, etc).

Work Package 5 has resulted in either enabling or drastically enhancing the high performance capabilities of several computational mathematics systems, namely Singular (D5.13 #111), GAP (D5.15 #113) and PARI (D5.16, #114), or of the dedicated library LinBox (D5.12 #110, D5.14 #112).

Bringing further HPC to a general purpose computational mathematics system such as SageMath is particularly challenging; indeed, they need to cover a broad -- if not exhaustive -- range of features, in a high level and user friendly environment, with competitive performance. To achieve this, they are composed from the ground up as integrated systems that take advantage of existing highly tuned dedicated libraries or sub-systems such as aforementioned.

Were report here on the exploratory work carried out in Task 3.5 (#54) to expose HPC capabilities of components to the end-user level of an integrated system such as SageMath.

Our first test bed is the LinBox library. Its multicore parallelism features have been successfully integrated in Sage, with a simple API letting the user control the desired level of parallelism. We demonstrate the efficiency of the composition with experiments. Going beyond expectations, the outcome has been integrated in the next production release of SageMath, hence immediately benefiting thousands of users.

We proceed by detailing the unique challenges posed by each of the Singular, PARI, and GAP systems. The common cause is that they were created decades ago as standalone systems, represent hundreds of man-years of development, and were only recently redesigned to be usable as a parallel libraries. Some successes were nevertheless obtained in experimental setups and pathways to production are discussed.

We conclude with lessons learned at the occasion of this work and through expertise sharing within and beyond OpenDreamKit: levels of integration one may wish for when composing parallel computational mathematics software. and challenges such integration would raise.

@nthiery

This comment has been minimized.

Copy link
Contributor

@nthiery nthiery commented Aug 26, 2019

Suggestion: please prepare a demo in notebook format; see #289.

@defeo

This comment has been minimized.

Copy link
Contributor

@defeo defeo commented Aug 26, 2019

@ClementPernet , can I have your input on this?

@ClementPernet

This comment has been minimized.

Copy link
Contributor

@ClementPernet ClementPernet commented Aug 26, 2019

@defeo : I plan to contribute a lagre part of the deliverable by the work in https://trac.sagemath.org/ticket/27444 and related releases https://trac.sagemath.org/ticket/26932
Btw having this ticket 27444 reviewed by the deliverable due date would be fantastic!
Can we chat or have a call tomorow (tues)?

@defeo

This comment has been minimized.

Copy link
Contributor

@defeo defeo commented Aug 26, 2019

@ClementPernet , can I have your input on this?

@defeo : I plan to contribute a lagre part of the deliverable by the work in https://trac.sagemath.org/ticket/27444 and related releases https://trac.sagemath.org/ticket/26932

Good. I assume there won't be time to expose in Sage the other parallelizations achieved in Pari or Singular?

Btw having this ticket 27444 reviewed by the deliverable due date would be fantastic!

And is this going to be feasible? I see that ticket is not in needs-review yet.

Can we chat or have a call tomorow (tues)?

Yes. I'm in Cernay, available at any time.

@jdemeyer

This comment has been minimized.

Copy link
Contributor

@jdemeyer jdemeyer commented Aug 26, 2019

expose in Sage the other parallelizations achieved in Pari

What exactly would that mean? For parallel algorithms inside PARI itself, there is no need to expose anything: it should "just work" (modulo bugs like https://trac.sagemath.org/ticket/26608)

@ClementPernet

This comment has been minimized.

Copy link
Contributor

@ClementPernet ClementPernet commented Aug 26, 2019

Btw having this ticket 27444 reviewed by the deliverable due date would be fantastic!

And is this going to be feasible? I see that ticket is not in needs-review yet.

Sorry about that, I forgot to put it in need-review once its dependency got merged. I just did it.

Yes. I'm in Cernay, available at any time.
Good I'll get in touch tomorrow morning.

@defeo

This comment has been minimized.

Copy link
Contributor

@defeo defeo commented Aug 27, 2019

@ClementPernet, so, as Jeroen said, Pari MT could be used in Sage now, but it is disabled in the distribution because of this: https://trac.sagemath.org/ticket/26608. It should be possible to compile a Sage version with MT enabled, if we don't care about the docs, Jeroen is going to try this now.

@defeo

This comment has been minimized.

Copy link
Contributor

@defeo defeo commented Aug 27, 2019

Following up on @embray's comment I'll try to summarize the discussion we had at lunch.

Pari has a few global variables (pointers to the stack, etc.), that were turned into thread-locals to allow multi-threading. This works well when Pari is the only piece of code responsible for creating threads, however it breaks badly when outside code creates threads and then calls Pari (which is happening in the docbuild), because thread locals cannot be initialized then. Said otherwise, Pari is not thread-safe from above.

A. Rojas confirms that currently this is only breaking the docbuild, because Sage does not use threads+Pari anywhere else, but it's obviously not acceptable to have Sage segfault whenever a user wants to write multithreaded Python code that eventually calls Pari.

Similar problems have also happened with BLAS in the past, and likely affect NTL, HPC-GAP and other Sage dependencies. There seems to be no easy fix. In the case of Pari, some hacks to Cypari may help limit the damage, but a complete solution would need to rethink Pari's memory management. In general, the only multi-threaded code that's easy to link is that which has no side-effects.

@nthiery was encouraging to write about these barriers in the deliverable. He was also asking how other projects solve this kind of multi-threading issues, however Sage seems to be very special in this respect, in that it is willing to depend on multi-threaded code without imposing any constraints on its dependencies.

@jdemeyer, @stevelinton, @embray, please feel free to correct me and add your thoughts.

@jdemeyer

This comment has been minimized.

Copy link
Contributor

@jdemeyer jdemeyer commented Aug 27, 2019

He was also asking how other projects solve this kind of multi-threading issues, however Sage seems to be very special in this respect, in that it is willing to depend on multi-threaded code without imposing any constraints on its dependencies.

I agree, I don't think that there any many "other projects" like this. There is numpy, which uses BLAS but BLAS seems to be thread-safe in this way so it doesn't have this problem.

@ClementPernet

This comment has been minimized.

Copy link
Contributor

@ClementPernet ClementPernet commented Aug 27, 2019

There is numpy, which uses BLAS but BLAS seems to be thread-safe in this way so it doesn't have this problem.

fflas-ffpack plays a similar role in SageMath, and since we designed it to be thread safe, we can safely run it through Sage (this is https://trac.sagemath.org/ticket/27444, still waiting for a review, if possible before friday's deadline!)
It is of course a much lower level lib than libpari or libgap.
I agree we should write about these barriers and explain how a informed user can use MT-PARI but that it can not be integrated by default in a Sage release.

@nthiery

This comment has been minimized.

Copy link
Contributor

@nthiery nthiery commented Aug 27, 2019

@ClementPernet

This comment has been minimized.

Copy link
Contributor

@ClementPernet ClementPernet commented Aug 29, 2019

I've pushed a draft outline for the report. I'm currently writing the section on parallel linear algebra.

@nthiery was encouraging to write about these barriers in the deliverable. He was also asking how other projects solve this kind of multi-threading issues, however Sage seems to be very special in this respect, in that it is willing to depend on multi-threaded code without imposing any constraints on its dependencies.

@jdemeyer, @stevelinton, @embray, @defeo : could you write something in the number theory section about the problems which lead to not being able to expose PARI-MT in SageMath ?

@embray

This comment has been minimized.

Copy link
Collaborator

@embray embray commented Aug 29, 2019

I could do it, but is it really necessary? IMO we could go ahead and use multi-threaded PARI in Sage, just with some possible caveats mentioned in the documentation (an perhaps plans to improve the situation in the future, particularly from within cypari2).

@ClementPernet

This comment has been minimized.

Copy link
Contributor

@ClementPernet ClementPernet commented Aug 29, 2019

Then I guess, the paragraph should be a mix of both: some experiments showcasing how it can be transparently used, and then a discussion explaining these caveats and the perspectives with cypari2.
Please feel free to start writing on the second aspect.

@defeo

This comment has been minimized.

Copy link
Contributor

@defeo defeo commented Aug 29, 2019

Then I guess, the paragraph should be a mix of both: some experiments showcasing how it can be transparently used, and then a discussion explaining these caveats and the perspectives with cypari2.

With a caveat: the MT speed-ups are in Pari 2.12, but Sage is still at 2.11.1, so activating Pari MT will not produce anything impressive at the moment.

@embray

This comment has been minimized.

Copy link
Collaborator

@embray embray commented Aug 29, 2019

Sure, but there's nothing stopping one from using Sage with a newer version of Pari, unless there is some major blocker. Has no one tried? Does it matter?

@nthiery

This comment has been minimized.

Copy link
Contributor

@nthiery nthiery commented Sep 3, 2019

Should Reimer be a coauthor?

@KBelabas

This comment has been minimized.

Copy link
Contributor

@KBelabas KBelabas commented Sep 3, 2019

@defeo: @jdemeyer: about PARI not being tread-safe from above, it seems to me that a simple solution is to disable PARI's MT mechanism when called from a SAGE-level thread.
PARI itself does this to avoid its own MT engine to be called recursively, but we currently do not export the mechanism.

Can you try a quick hack:

  1. I'm assuming you have a sane SAGE way of knowing whether you are in a secondary thread or in SAGE's main thread. [ In PARI we have the undocumented function mt_is_parallel() which returns 1 iff we must only use the MT engine in 'single' mode ]
  2. in the place responsible for calling a PARI function, check whether you are in such a secondary thread and if so set some ridiculous global variable
  3. in src/mt/pthread.c:mt_queue_start_lim(), change the 3rd line to
    if (pari_mt || lim <= 1 || my_ridiculous_global_variable_is_set)
  4. reset the global variable when PARI's function returns [ and possibly beware of exceptions being thrown, etc ]

I believe this should prevent the PARI SEGVs you observed: everything should "just work" (except that PARI-MT will be disabled in SAGE's secondary threads).

If it does, will investigate how to do this cleanly ...

@wbhart

This comment has been minimized.

Copy link
Contributor

@wbhart wbhart commented Sep 3, 2019

@nthiery Reimer did contribute. In many cases I wrote down in pen exact words of his as he explained many things to me, so he's really directly an author of many sentences.

@KBelabas

This comment has been minimized.

Copy link
Contributor

@KBelabas KBelabas commented Sep 3, 2019

In fact a (slightly) less hackish way is to set the (PARI) global variable pari_mt_nbthreads to 1
instead of "some ridiculous global variable". Then you don't even need to change PARI and can skip 3.

The main problem is to make sure of doing "the right thing" in case of interrupts, signals, exceptions, etc. So as to reset pari_mt_nbthreads to its previous value when PARI functions returns to SAGE, however control is restored to SAGE toplevel.

@nthiery

This comment has been minimized.

Copy link
Contributor

@nthiery nthiery commented Sep 3, 2019

Ah, so there is presumably a way to configure at runtime whether to use multithreading or not. Cool. Thanks Karim!
Not sure whether we will have a chance to test this before the submission of the deliverable (@defeo and @jdemeyer both started on a new job yesterday!), but I'll now have a look at text to see if we can slightly adjust the phrasing.

@nthiery

This comment has been minimized.

Copy link
Contributor

@nthiery nthiery commented Sep 3, 2019

Reimer is now a coauthor. I updated the abstract (see first comment on this issue) to reflect the new content. I fixed the sectioning (GAP,... were in the LinBox section). I went through the updated GAP/Singular/PARI sections without spotting typos.
PDF updated.

@wbhart

This comment has been minimized.

Copy link
Contributor

@wbhart wbhart commented Sep 3, 2019

Ah, I just realised some of Karim's comments to me were about the intro to the report, not the GitHub ticket description. So I need to apply some typo corrections to that section if we haven't submitted.

@wbhart

This comment has been minimized.

Copy link
Contributor

@wbhart wbhart commented Sep 3, 2019

Ah I see why. It's not in the document I was editing. I'll try to figure out where it is and edit it there.

@nthiery

This comment has been minimized.

Copy link
Contributor

@nthiery nthiery commented Sep 3, 2019

@wbhart

This comment has been minimized.

Copy link
Contributor

@wbhart wbhart commented Sep 3, 2019

Ok, I fixed the intro, hopefully. A final read through by someone would be nice.

@wbhart

This comment has been minimized.

Copy link
Contributor

@wbhart wbhart commented Sep 3, 2019

Oh, there is one more thing. It refers to Task ... , but there's a number missing for the task.

@nthiery

This comment has been minimized.

Copy link
Contributor

@nthiery nthiery commented Sep 3, 2019

Ok @wbhart , will do tomorrow early morning.
@KBelabas: out of curiosity: does setting nthread=1 revert to plain sequential code, or just run the parallel code with just one thread?

@wbhart

This comment has been minimized.

Copy link
Contributor

@wbhart wbhart commented Sep 3, 2019

I just fixed another two obvious typos in the main report. There are some less obvious ones, but I think it's good enough given what time it is.

@KBelabas

This comment has been minimized.

Copy link
Contributor

@KBelabas KBelabas commented Sep 4, 2019

@ClementPernet

This comment has been minimized.

Copy link
Contributor

@ClementPernet ClementPernet commented Sep 4, 2019

Thanks all and in particular @wbhart for this final push on the report. As I mentionned to @nthiery , I could not spend more time on it early this week as I had to deal with classes and everything I put on hold the whole previous week.

Oh, there is one more thing. It refers to Task ... , but there's a number missing for the task.
I fixed it by referring to task 3.5 in the Github description.

I'm currently proofreading the report which looks much better. I'll post as soon as I'm done with the proofreading.

@embray

This comment has been minimized.

Copy link
Collaborator

@embray embray commented Sep 4, 2019

Seconding the thank you to @wbhart on this. I also did not fully realize that August 31 was not a hard deadline, but he enumerated the current difficulties in far more detail and completeness than I could have. I would also second that, with respect to producing an "HPC Sage" I think it suffices to say that ODK has contributed to tackling different approaches to parallelism in different system, with "Sage" in this case being a kind of acid test for how all those different approaches work when thrown together in the same arena (conclusion: not well; further collaboration careful design is needed).

@ClementPernet

This comment has been minimized.

Copy link
Contributor

@ClementPernet ClementPernet commented Sep 4, 2019

Done proof-reading. I pushed a typo fix and a clarification in LInBox section.
Good to go on my side.

@embray

This comment has been minimized.

Copy link
Collaborator

@embray embray commented Sep 4, 2019

@KBelabas Re:

about PARI not being tread-safe from above, it seems to me that a simple solution is to disable PARI's MT mechanism when called from a SAGE-level thread.

I'm not sure it's that simple as the problem with MT PARI is not to do with PARI's multi-threaded code itself (e.g. I can and have reproduced the problem even by forcibly hard-coding pari_mt_nbthreads = 1 effectively disabling multi-threading in PARI).

It doesn't matter because when you build PARI with --enable-tls or --mt=pthread (which implies the former), it forces numerous global variables to use thread-local storage (e.g. pari_mainstack and avma). So if you start a thread in Python (which uses pthreads under the hood on POSIX systems), those TLS variables are uninitialized in the new thread, so as soon as you try to do anything with PARI-- 💥.

When PARI starts its own threads--at the beginning of their start up routines (mt_queue_run)--they call a function pari_thread_start which in turn does this like call pari_mainstack_use to initialize the global (TLS) pari_mainstack pointer to the thread's mainstack, and also calls some functions like pari_thread_init() and pari_thread_init_varstate() which in turn are responsible for re-initializing some of the other critical global state (e.g. pari_thread_init_primetab).

Something @jdemeyer and I discussed but have not to my knowledge tried yet, is that at least at the Python level (i.e. in cypari2) we could check whether or not we're running on the process's main thread, and also check (before any call is made, really) whether one of these critical TLS variables have been initialized (e.g. avma) and if not, it could have the level of awareness about PARI to

  • Keep a copy of the mainstack pointer in the main thread
  • Call pari_mainstack_use to set it back to the main thread's PARI stack
  • Manually call something like pari_thread_init.

That would probably work as a first-pass fix, but it's not clear to me (not knowing PARI well-enough) if that would be a fully stable solution. For starters, obviously, although it means one can technically try to use PARI outside the main thread it won't be thread-safe without putting some kind of "global interpreter lock" around anything which uses the PARI interpreter or manipulates PARI objects (perhaps it would be better for now, in cypari, if it just disallowed use off the main thread, but at least threw a Python-level exception instead of segfaulting as it does currently).

And then, in a more arbitrarily complex setting one runs afoul of other problems @wbhart mentioned, such as complex recursive data structures that might happen to have a PARI object (either from its heap, or worse yet its stack) floating around. This could be considered a mistake, perhaps, in a multi-threaded environment. But there is probably existing code that is not concerned about this, in which hard-to-trace bugs are likely to appear.

None of this is a criticism or anything--it's just the reality of how quickly complicated and bothersome this problem can become.

@embray

This comment has been minimized.

Copy link
Collaborator

@embray embray commented Sep 4, 2019

Hmm, I wonder if, at least for plain Python threads, it wouldn't suffice to just let the actual Python GIL also protect the PARI interpreter. That doesn't solve anything when it gets into arbitrarily complicated multi-level, multi-system parallelism. But at least for the simple case of Python + PARI it might be sufficient.

@jdemeyer

This comment has been minimized.

Copy link
Contributor

@jdemeyer jdemeyer commented Sep 4, 2019

Hmm, I wonder if, at least for plain Python threads, it wouldn't suffice to just let the actual Python GIL also protect the PARI interpreter.

Since cypari2 requires the GIL (it's using Python after all), that's already de-facto the case.

@jdemeyer

This comment has been minimized.

Copy link
Contributor

@jdemeyer jdemeyer commented Sep 4, 2019

Something @jdemeyer and I discussed but have not to my knowledge tried yet, is that at least at the Python level (i.e. in cypari2) we could check whether or not we're running on the process's main thread, and also check (before any call is made, really) whether one of these critical TLS variables have been initialized (e.g. avma) and if not, it could have the level of awareness about PARI to

* Keep a copy of the mainstack pointer in the main thread

* Call `pari_mainstack_use` to set it back to the main thread's PARI stack

* Manually call something like `pari_thread_init`.

I think this could work, but it would be helpful to have some official PARI API for this.

@nthiery nthiery added Submitted and removed needs review labels Sep 4, 2019
@nthiery

This comment has been minimized.

Copy link
Contributor

@nthiery nthiery commented Sep 4, 2019

Thank you everyone for contributing to this report! Beautiful collective intelligence at work. A bit late, but it was well worth it.
I am looking forward nice demos at the review.
And maybe a follow up in the form of some blog post or expanded paper.

Let's keep the discussion moving; and maybe we should have a joint workshop at some point.

@KBelabas

This comment has been minimized.

Copy link
Contributor

@KBelabas KBelabas commented Sep 5, 2019

@defeo

This comment has been minimized.

Copy link
Contributor

@defeo defeo commented Sep 5, 2019

(@defeo and @jdemeyer both started on a new job yesterday!)

I just finished proof-reading (don't tell my new employer :p ), and pushed a few typographic improvements.

I have two minor questions @ClementPernet:

  • "the product of a matrix ... with itself". Any reason not to say "the square"?
  • Amdhal's law [citation needed]

Thanks to everyone for this nice and very instructive report!

@ClementPernet

This comment has been minimized.

Copy link
Contributor

@ClementPernet ClementPernet commented Sep 6, 2019

* "the product of a matrix ... with itself". Any reason not to say "the square"?

I prefer to let it this way as computing the square somehow seem to imply that a dedicated algorithm for squaring is being used, which is not the case here.

* Amdhal's law [citation needed]

Done.

@embray

This comment has been minimized.

Copy link
Collaborator

@embray embray commented Sep 6, 2019

@KBelabas

knowing essentially nothing about how SAGE launches PARI behind the hood

For the most part this has nothing to do with Sage even, it just happens to occur there being that Sage is one of the only applications (that I know of) that's using PARI as a third-party library. I believe it's not the only one though but I forget. Jeroen would know.

what happens with standard PARI ? I'd expect things to crash in an even
worse way in any pthread application, since PARI is not reentrant at all without
--enable-tls; so using the TLS version should only improve things, not break
them. Can you give us a specific (not too hard to reproduce) example
which works just fine with regular PARI but breaks with --enable-tls ?

I'm not exactly sure what you're asking for here--an example of what? If it's a question of multi-threaded code that uses PARI, no, I don't have such an example (and as you say it would not work anyways). I agree, of course, that for PARI's multi-threaded operations the TLS variables are necessary :) I can give one specific example involving multi-level parallelism in a sort of unfortunate way, where code that otherwise works would break. I'll get to that below since it's relevant to your other questions...

Debian ships with both sagemath and PARI; it seems that configuring with
--enable-tls there doesn't harm sagemath. Here's the builddlog in that case

It does not, to my knowledge, impact building Sage or running the tests. The only place where it does have a visible impact currently (and this could easily change) is in building the Sage documentation. Here it was known by the debian-science community to break. I forget who narrowed the problem down to PARI with --enable-tls--I think it might have been me, but possibly it was someone else who first identified that as the difference that made the difference, and I just investigated later. In any case, it was a known issue that has since been patched in Debian to work around it. I don't know exactly what patch they used, but I think it might have been to not use Sage's parallelized documentation build system--when running the docbuild on a single process the issue did not occur.

The reason for that issue is discussed originally here, and is annoyingly technical, but to summarize: The issue occurred due to a specific design choice of a construct in Python called multiprocessing.Pool, which is used for building Sage's docs in parallel processes. Pool creates a pool of worker subprocesses to which any number of tasks can be farmed out simultaneously (it has a work queue), and allows results of different tasks to be returned asynchronously. It also allows a task limit on individual worker processes, after which they are killed and replacement workers spawned in their place (useful e.g. to avoid memory leaks in long-running processes, etc.).

In order for all this to work, Pool starts a thread which is responsible for creating the worker processes, which it does using fork() on POSIX systems. Normally, in a single-threaded application, if we just fork() from the parent process all the memory is copied over and we can keep using PARI as before. However, because Pool is running a secondary thread from which fork() is called, all of PARI's global variables are already uninitialized (because it's on a new thread). Once fork() is called and execution continues in the child process (which also in the case of Pool does not exec(), but rather goes through a complex bootstrap process in order to run whatever Python function the worker has been tasked to execute) PARI is uninitialized and blows up (even when running otherwise serial code).

This specific case could be alleviated somewhat if we used pthread_atfork() to make sure PARI gets re-initialized (if necessary) after a fork() (again, under the circumstances of a plain fork() this isn't necessary, but it is if fork()-ing from a thread).

@jdemeyer

This comment has been minimized.

Copy link
Contributor

@jdemeyer jdemeyer commented Sep 6, 2019

Can you give us a specific (not too hard to reproduce) example which works just fine with regular PARI but breaks with --enable-tls ?

That's easy: have an application with 2 threads A and B. Call pari_init() from thread A and any PARI function (a simple cgetg for example) in thread B.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.