-
-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore ability to use MPI implementations with unknown ABIs #574
Comments
So I see two solutions. The first is to provide a script that generates a Secondly we could use MPItrampoline's |
Re/ the first option: Not sure if we need provide this with the described level of automation. I think the ability to generate the Re/ the second option, I cannot really say since I do not know what following this path would entail. |
(I didn't see this discussion earlier, and we discussed on Discord.) The earlier The MPIconstants projects hosts a small C program that extracts the constants from any MPI implementation. We could build and run this at configure time. This generates two files. One is a Julia file that defines the compile-time constants. The other is a C file that extracts the load-time constants at run time, and defines global variables that can be read from Julia. This file needs to be built as shared library and loaded from Julia. Thus the process to make a new MPI implementation available to Julia is quite automated. If we replace the cmake build system by custom code, then we could easily run this at |
This sounds like a potential solution. However, I strongly suggest that if we use this path, the ability to compile a C program to figure out the constants should be included in MPI.jl itself and not rely on downloading yet another repository. |
Question from the sideline: Wasn't the idea to not have a build step anymore? (Oh I see, Valentin already mentioned this above) |
For clarity: The build step would only be necessary if someone uses |
As just discussed during the monthly Julia for HPC call (thanks also to @mkitti @giordano @williamfgc for the discussion): Maybe it would be sufficient for now to restore the ability to use a custom ABI file (such as https://github.com/JuliaParallel/MPI.jl/blob/master/src/consts/mpt.jl or https://github.com/JuliaParallel/MPI.jl/blob/master/src/consts/mpich.jl) on a machine where the system MPI implementation is not compatible to any of the ABIs supported by MPI.jl. That is, we could add an additional keyword argument MPI.jl/lib/MPIPreferences/src/MPIPreferences.jl Lines 122 to 128 in 112c723
which would default to nothing . If users want to support a custom MPI ABI, however, they could pass the path to a manually generated ABI file, in which case the abi keyword argument would be ignored. That way, users would be able to use MPI.jl as an installed package, and system administrators could provide this as a default on a compute cluster.
This approach would still require users to manually create the ABI file, but at least it allows customization without having to clone MPI.jl and just hack it in. We could also add a few sentences to the docs, explaining the basic steps you have to go through to create your own ABI constants file. What do you think about this idea? It would be great to get some feedback from both the MPI.jl maintainers perspective (@simonbyrne @vchuravy) and other supercomputer operators (e.g. @omlins @carstenbauer). |
I'm not opposed to it, but I do wonder as to the utility: are there more ABIs in the wild? |
We had this discussion yesterday as well. For the majority of systems, the answer is no: Most university and/or commodity clusters are likely to use one of the "big two", i.e., either MPICH (or something compatible) or OpenMPI. However, especially for leadership systems, vendors tend to provide their own MPI implementations, which may or may not be compatible to MPICH. Sometimes, the implementations are "mostly" compatible but have some peculiarities. We had a longer discussion yesterday on how to proceed with this issue. It is likely that currently nobody has the motivation to recreate the auto-detection system used until v0.19, since everyone presently involved in MPI.jl does not have an issue with unknown ABIs anymore (including me). This most recent proposal is thus a compromise between developer effort and closing the doors for not-yet-officially-supported ABIs. |
I don't think it's necessary to re-introduce this ability. There are other issues where our time is better spent. Apart from this, and for the record: The package MPIconstants does just this. It compiles two small C files that output the requested information, both the compile-time settings (e.g. how MPI handles are implemented) and run-time constants via a shared library (e.g. the value of |
Based on comments by @vchuravy and as far as I understand from the code here, with the new MPIPreferences/"build-less" approach there is currently no way anymore to use MPI.jl with MPI backends that are not already known to MPI.jl. IMHO this is very unfortunate, since at least up to the current release of MPI.jl it was possible to use just any MPI implementation with MPI.jl and have the auto-detection mechanism figure out the ABI.
To use Julia with an unknown MPI implementations seems to only possible with via MPItrampoline at the moment. While MPItrampoline is a great tool and will certainly (hopefully) make things much smoother in the future, it is still comparably new and has not yet taken hold in most supercomputer centers. Therefore, HPC systems with non-compatible MPI ABIs (such as HPE's MPT, which is not compatible to any other MPI ABI) are precluded from using MPI.jl.
Since the current MPI.jl release still works technically flawless with unknown MPI (at least for our system with HPE's MPT), I strongly suggest for the time being we restore the ability to support other MPI ABI's than the big 3 + MPItrampoline. Ideally, one could have a (non-exported?) function to trigger the generation of an MPI constants file that one could either feed locally into own's one MPI.jl package (e.g. via the use of preferences) or that can be used as a basis to creating a PR to MPI.jl to add as a new officially supported ABI (where it would be appropriate). Otherwise it makes it much harder to support Julia with MPI on systems such as HLRS's Hawk, where the default MPI implementation is MPT and most available parallel tools such as HDF5 are provided for MPT.
cc @luraess
The text was updated successfully, but these errors were encountered: