Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No error output when using mpirun with hpx #1221

Closed
pagrubel opened this issue Aug 12, 2014 · 7 comments
Closed

No error output when using mpirun with hpx #1221

pagrubel opened this issue Aug 12, 2014 · 7 comments

Comments

@pagrubel
Copy link
Member

While running hpx with mpirun, errors are not reported. Just try running an application that was not compiled for MPI parcel port and it hangs. Other errors also cause this to happen but can't tell you what they are because we can't see the errors.
.

@parsa
Copy link
Contributor

parsa commented Aug 12, 2014

I think it's a problem with SLURM. It doesn't happen just for MPI application ran with mpirun, unless you open an interactive bash session with srun, it happens. If you try to build something that has compilation errors, you will not be able to see those errors and the SLURM job just hangs.

@hkaiser
Copy link
Member

hkaiser commented Aug 12, 2014

What do I need to do to reproduce the issue?

@sithhell
Copy link
Member

This is a non issue. There is nothing much we can do here. It's perfectly
valid to run non-mpi applications with mpirun. Try "mpirun echo hello" for
example. The problem is that some mpirun implementations don't forward the
environment properly. As such, every hpx process is waiting for someone
else to connect. It's a configuration problem on the user side which is not
really detectable.

@sithhell
Copy link
Member

Will close as won't fix as i think there is nothing we can do here.

@hkaiser
Copy link
Member

hkaiser commented Aug 13, 2014

Well, I have not even understood what the problem was... I was waiting for some explanations,
Why have you closed this?

@sithhell
Copy link
Member

I closed it because i got no reply to my explanation above ... So here it is again:
mpirun a.out has the effect of launching a.out. This might launch N or just one process on different machines, that depends on the environment and such. Now it might happen that it doesn't pass the environment correctly. So what pat sees is the equivalent of running N processes like this:
./a.out --hpx:localities=N --hpx:console
There is absolutely nothing we can do about that, IMHO. As such i closed the ticket as "won't fix".

@pagrubel
Copy link
Member Author

Well that's what I suspected before i posted it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants