-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI initialization error #118
Comments
Original comment by Bjoern Forsberg (Bitbucket: bforsbe, GitHub: bforsbe): Sounds like an issue with MPI specifically. Try executing directly on the commandline by e.g. mpirun -n 3 relion_refine_mpi ..... --j 1 and check the cmake config step of your installation for clues with respect to detection of mpi versions. |
Original comment by Matthew Belousoff (Bitbucket: mbelouso, GitHub: Unknown): I have the same issue. The 'alpha' versions were working fine, but as soon as I built the open beta version this is the errors that I get (see run.err readout below). There is nothing weird when I run cmake, it finds openMPI with no dramas, and it doesn't matter if I run it with mpirun from the command line, the same problem exists. Command:
|
Original comment by Bjoern Forsberg (Bitbucket: bforsbe, GitHub: bforsbe): @mbelouso The only differences I can see between the 2.0.1 version and earlier versions (with regards to mpi) is associated with --scratch_dir. What happens when you omit that? Also, not knowing if you already did so, I would make a fresh build-directory to make sure you don't have any old references or built libraries messing with you new 2.0.1 build;
As your output also explicitly states, this really indicates an error in MPI, but if you can verify two versions of RELION (like 2.0.1 and 2.0.b12) that you compiled in different directories on the same machine using the same session and settings, and get an error from one and not the other, I will dig into sorting out what it is we are doing that is causing it. |
Original comment by Matthew Belousoff (Bitbucket: mbelouso, GitHub: Unknown): Bjoern, So I managed to fix it. It was a problem with the MPI on the computer and a fresh installation of openMPI fixed it. Sorry to waste your time. |
Original comment by Bjoern Forsberg (Bitbucket: bforsbe, GitHub: bforsbe): No worries, it's good to have documented symptoms that we know the reason for, even if it isn't a problem with relion. @achintangal Did your issue work out as well? |
Just had this issue too (relion 3.1.1) and my issue turned out to be that EMAN2 happens to ship with its own version of mpirun and this was on my PATH. Commenting out the EMAN2 lines in my .bashrc file was enough to get the relion functions working again. Only commenting since @bforsbe points out that:
I'll need to cogitate about how to get my relion and EMAN2 installs to live happily next to each other... |
Print error when aretomo fails to align tilt series, add tomo name column to particles.star when picking particles
Originally reported by: Abhiram Chintangal (Bitbucket: achintangal, GitHub: achintangal)
When trying to run jobs using more then 1 CPU (relion_refine_mpi) I get the following error:
ompi_mpi_init: orte_init failed
--> Returned "Error" (-1) instead of "Success" (0)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
[poissons:14318] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
For completeness this is the command I am trying to run (note.txt file):
++++ Executing new job on Tue Oct 4 14:54:49 2016
++++ with the following command(s):
which relion_refine_mpi
--o Class3D/job046/run --i Extract/job041/particles.star --ref Import/refs.star --firstiter_cc --ini_high 60 --dont_combine_weights_via_disc --pool 3 --ctf --iter 30 --tau2_fudge 4 --particle_diameter 300 --K 2 --flatten_solvent --zero_mask --strict_highres_exp 12 --oversampling 1 --healpix_order 2 --offset_range 3 --offset_step 2 --sym C1 --norm --scale --j 1 --gpu 0,1++++
If I run the same job with only 1 CPU with (relion_refine) I can get it to run.
The text was updated successfully, but these errors were encountered: