-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CMSSW supporting MPI #18174
Comments
A new Issue was created by @perrozzi . @davidlange6, @Dr15Jones, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
please propose the appropriate set of configure arguments for OpenMPI |
I've been working on a local installation of Sherpa compiled against OpenMPI. I don't quite have it working at the moment but I hope to have more info soon. One thing I'm sure is that the c++ bindings, which are deprecated in the MPI standard and are disabled in OpenMPI by default, are necessary. This requires the configuration option --enable-mpi-cxx In general, is there any reason to prefer OpenMPI vs. MPICH2? If the program is compiled against OpenMPI, and a cluster supports MPICH2, will it work? |
its probably best that the GEN group evaluate and make a request for what specifically is wanted. there aren't other known use cases (and thus few other experts on this sort of topic) in cmssw
… On Apr 3, 2017, at 3:08 PM, Kenneth Long ***@***.***> wrote:
I've been working on a local installation of Sherpa compiled against OpenMPI. I don't quite have it working at the moment but I hope to have more info soon. One thing I'm sure is that the c++ bindings, which are deprecated in the MPI standard and are disabled in OpenMPI by default, are necessary. This requires the configuration option --enable-mpi-cxx
In general, is there any reason to prefer OpenMPI vs. MPICH2? If the program is compiled against OpenMPI, and a cluster supports MPICH2, will it work?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@kdlong - one question is of expectations management. How many cores would you like to target for the Sherpa use case? I ask because the approach here is completely different if Sherpa can run on a single host (8-64 cores) versus multiple hosts. For a single host, one can just utilize OpenMPI with the shmem (shared memory) backend and disable all others; in such a case, it's irrelevant what the underlying cluster uses. For multiple host runs (i.e., using the shared fabric), there's no expectation of portability between clusters. After all, MPI stands for |
@bbockelm - my impression is that we should first target this single host workflow. Without MPI the jobs can run on exactly 1 core, so 8 - 64 would already be a huge improvement, even if it isn't sufficient for the most intensive processes. A version of Sherpa + OpenMPI compiled in CMSSW would be a huge help for this type of workflow. I'm working on testing this with a local installation and will hopefully have feedback in ~ a week. |
Gotcha - in that case, you simply want the https://www.open-mpi.org/faq/?category=sm One only needs to consider site-specific concerns if you need to go between nodes - otherwise, you want to really avoid the site's MPI stack. Next question - does Sherpa run inside CMSSW or as an ExternalLHEProducer? Embedded MPI library calls into CMSSW might require some careful thought. If it's an ExternalLHEProducer, then this should be straightforward. |
But my understanding is that the event generation won't use MPI, just the gridpack generation, which is done externally. We need to confirm/test this, but I think it can work. @vciulli, does this seem right? |
yes indeed MPI is needed only to create the sherpack. in fact, what is currently done is to use a standalone version to create the sherpack somewhere like DESY, do some manipulation to make it comply with the "native CMSSW" sherpack, then use it as if it was made using sherpa inside CMSSW, cfr |
It might be reasonable to continue to run sherpa standalone to produce the sherpacks, but just to do so directly with the compiled version shipped with CMSSW. |
yes, I agree, to avoid any inconsistency |
@pmillet can comment but I think the command cmsRun is not invoked to create the sherpack, even if one uses the scripts prepared for using Sherpa inside CMSSW MPI is certainly not needed to generate the events I agree using multiple core in a single machine is already a good starting point. |
cmsRun is not invoked when creating the sherpack |
apparently this was also tested in the past, see |
(@fabiocos please comment in case) |
This was indeed tested and used for private production (supporting SMP-12-017) in the branch IB/CMSSW_5_3_X/slc5_amd64_gcc462-sherpa2 with sherpa version 2.0.beta2, that stayed around for about one year. You may find in the same branch the openmpi.spec I used at that time. I never tested the setup on multiple nodes simultaneously, but used a 16 cores node for production of the sherpacks, and it worked. |
@perrozzi - if you guys take this approach with CMS Connect, I've been working to get more hosts available with >8 cores. The maximum currently is 56 (and you may end up in a quite long line for these). At 8 cores, you should get as many cores as you need. HTH! |
@bbockelm thanks for the info, this possibility will be definitely taken into account when mpi will be integrated in cmssw |
@fabiocos thanks a lot, we should simply replicate what was done in 53x then |
@pmillet could you try to make a PR to copy what was in 53x? |
ok |
any news on this? |
So far I tried simply copying what was done in 53X to a recent release but got errors due to missing libraries. I will try again beginning of next week. Sorry for the delay. |
Hi @pmillet - If you get stuck, feel free to post here and let us know! Brian |
Ok, thanks!. So this is what I did pmillet/cmsdist@5621b0b |
it should use /usr/bin/env perl instead. You can follow an example like this one
https://github.com/cms-sw/cmsdist/blob/539993e46e29ec20dceebfff18a9215919298f43/root.spec#L170
… On May 15, 2017, at 11:39 AM, pmillet ***@***.***> wrote:
Ok, thanks!. So this is what I did ***@***.***
When I try to build sherpa it fails while building openmpi. The error message is the following:
RpmInstallFailed: Failed to install package openmpi. Reason: error: Failed dependencies: /bin/perl is needed by external+openmpi+1.6.5-1-1.x86_64 libbat.so()(64bit) is needed by external+openmpi+1.6.5-1-1.x86_64 liblsf.so()(64bit) is needed by external+openmpi+1.6.5-1-1.x86_64
Does anybody know what to include to get those files?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
thanks. I added the following lines to the openmpi spec file (https://github.com/pmillet/cmsdist/blob/sherpa_openmpi/openmpi.spec#L6-L9), and updated the version to a more recent one. `Checking local path dependency for rpm package external+openmpi+2.1.0-cms just build. Requested to quit. Requested to quit. Requested to quit. Requested to quit. The action "install-external+openmpi+2.1.0-cms" was not completed successfully because Traceback (most recent call last): File "/build/millet/ext/CMSSW_9_1_X/20170515_1016/PKGTOOLS/scheduler.py", line 199, in doSerial File "PKGTOOLS/cmsBuild", line 3017, in installPackage File "PKGTOOLS/cmsBuild", line 2829, in installRpm RpmInstallFailed: Failed to install package openmpi. Reason: error: Failed dependencies:
` |
Presumably those libs will be present on any worker node capable of openmpi?
If so - you can add something like this to the source part of the spec file
Provides: libber.so()(64bit)
Provides: liblsf.so()(64bit)
(but maybe @davidlt or others have better advice)
… On May 16, 2017, at 4:09 PM, pmillet ***@***.***> wrote:
thanks. I added the following lines to the openmpi spec file (https://github.com/pmillet/cmsdist/blob/sherpa_openmpi/openmpi.spec#L6-L9), and updated the version to a more recent one.
After building I now get the following error:
`Checking local path dependency for rpm package external+openmpi+2.1.0-cms just build.
Requested to quit.
Requested to quit.
Requested to quit.
Requested to quit.
• The action "install-external+openmpi+2.1.0-cms" was not completed successfully because Traceback (most recent call last):
File "/build/millet/ext/CMSSW_9_1_X/20170515_1016/PKGTOOLS/scheduler.py", line 199, in doSerial
result = commandSpec0
File "PKGTOOLS/cmsBuild", line 3017, in installPackage
File "PKGTOOLS/cmsBuild", line 2829, in installRpm
RpmInstallFailed: Failed to install package openmpi. Reason:
error: Failed dependencies:
libbat.so()(64bit) is needed by external+openmpi+2.1.0-cms-1-1.x86_64
liblsf.so()(64bit) is needed by external+openmpi+2.1.0-cms-1-1.x86_64
`
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
please check /build/millet/ext/CMSSW_9_1_X/20170515_1016/a/BUILD/slc6_amd64_gcc530/external/openmpi/2.1.1/log file for more details |
@pmillet OR make a PR with your changes and we will try to build/debug the issue |
or file an issue with openmpi.. seems there is something similar in the last two weeks
open-mpi/ompi#3447
… On May 17, 2017, at 4:47 PM, Brian Bockelman ***@***.***> wrote:
That appears to be a bug in the OpenMPI configure script. Two potential approaches (assuming you don't want to go through the guts of OpenMPI's build system):
• Post all the relevant logs here and see if David or Shahzad have time / effort to investigate and fix OpenMPI.
• Build within a Docker container - or personal VM - that doesn't have the LSF headers installed. Then, LSF support won't be detected and OpenMPI will avoid trying to use the problematic feature.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thanks again for the suggestions. The issue @davidlange6 linked seems to be similar to the one I encountered. I opened an issue in the openmpi repo open-mpi/ompi#3546. |
It looks like the fix the Open-MPI people propose needs a more recent version of autotools. I tried building with pmillet/cmsdist@fce2afc and it fails with:
Should I ask try asking for a patch which does not need a newer version of autotools? Or is updating autotools a possibility? I attached the full log to this post. Thanks again for your help. |
We could bump automake in DEVEL IBs. Might need some minor cleaning up in worst case scenario. The fix does not directly require automate 1.15, but once you modify the file the build scripts needs to be regenerated. That was originally generated with automatke 1.15, which was released more than 2 years ago. |
How should we proceed with this? I tried what is suggested in open-mpi/ompi#3546 which is apply the patch on a system which has recent automake, run ./autogen.pl
./configure
make dist and use this tarball on the SCL6 machine. However I still get the same error as above. Will there be an automake update? Or should I keep trying to use the method from above? Thanks a lot for your advice. |
it looks like @davidlt did update the gcc700 branch to automake 1.15 - can you try that (or is your needed patch part of the openmpi pull request now)?
… On Jun 7, 2017, at 4:30 PM, pmillet ***@***.***> wrote:
How should we proceed with this? I tried what is suggested in open-mpi/ompi#3546 which is apply the patch on a system which has recent automake, run
./autogen.pl
./configure
make dist
and use this tarball on the SCL6 machine. However I still get the same error as above. Will there be an automake update? Or should I keep trying to use the method from above? Thanks a lot for your advice.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
ok, i'll try the gcc700 branch |
You are getting this error because you are probably applying the patch to a release tarball, which has all build scripts already generated. Use |
I used the --force option (sorry it is missing in the commands above), since autogen.pl refused to run without (for the reason you explained). |
ping: any news? |
@pmillet , did you try it with gcc700 cmsdist branch? |
yes. OpenMPI builds fine with the patch and the updated automake. Yesterday I just had the problem that sherpa would not build anymore. I modified the build file and its currently trying to build it again. I'll report the outcome as soon as it is ready. |
once you have it working then please make CMSDIST PR for gcc700 branch. |
so I still get the same error as yesterday when building sherpa
logs: |
Sherpa cannot find Check if it exist under |
From
If both are needed by Sherpa, enabled them explicitly. |
It finally seems to have worked. PR (cms-sw/cmsdist#3108). Thanks for all the comments and suggestions. |
|
somehow my answer did not make it till here:
currently 93X
There is no dedicated documentation for this yet. In principle it should be enough to add -m 'mpirun -n NCORES' to the MakeSherpaLibs.sh call.
So far the only thing to report would be that it is included in 93X. I could make a one slide for this if you want.
From the Sherpa side I do not see an issue with backporting it. |
Hi Philipp, I kindly ask to please backport the feature to 71X, add a line with the instructions in the Sherpa twiki and prepare a slide for today's GEN meeting. Thanks |
When trying to build openmpi in 71X the following message appears + ./autogen.pl --force
Open MPI autogen (buckle up!)
1. Checking tool versions
Searching for autoconf
Found autoconf version 2.68; checking version...
Found version component 2 -- need 2
Found version component 68 -- need 69
==> Too low! Skipping this version
=================================================================
I could not find a recent enough copy of autoconf.
I need at least 2.69, but only found the following versions:
autoconf: 2.68
I am gonna abort. :-(
Please make sure you are using at least the following versions of the
tools:
GNU Autoconf: 2.69
GNU Automake: 1.12.2
GNU Libtool: 2.4.2
=================================================================
error: Bad exit status from /build/millet/ext/CMSSW_7_1_X/20170717_2053/a/tmp/rpm-tmp.jJ3QkI (%prep) Can we get the newer autoconf also in 71X? |
just get the latest autotools.spec from https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_9_3_X/gcc630/autotools.spec in your 71x cmsdist. Once every thing builds then make a Pull request. |
Can this issue be closed? |
yes this is complete |
As detailed in the talk during the O&C week https://indico.cern.ch/event/624140/contributions/2533506/attachments/1438011/2212147/Long_GENComputingReport_2017_04_03.pdf
we would like to have MPI support inside CMSSW, to boost the usage of Sherpa MC generator.
From what I understood during the discussion, this should be relatively straightforward.
@kdlong @vciulli @bendavid can comment further
The text was updated successfully, but these errors were encountered: