Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hung test leads to cascading test failure; make tests should support the MPI parcelport #879

Closed
eschnett opened this issue Sep 25, 2013 · 22 comments

Comments

@eschnett
Copy link
Contributor

From time to time, an HPX program refuses to start with the error message

hpx::init: std::exception caught: bind: Address already in use: HPX(network_error)

I am using the MPI parcelport, and start the program via mpirun; in this case via

mpirun -np 2 ./bin/block_matrix --hpx:thread=8

This error seems to occur when another HPX program is running on my workstation. This could be either an old program that didn't shut down properly and is running in the background, or (in this case) HPX programs running via "make tests".

@eschnett
Copy link
Contributor Author

This issue can also lead to a cascade of self-test failures. In this case, all self tests running in another terminal from #57 to #77 failed because I tried to run another HPX program at the same time.

@hkaiser
Copy link
Member

hkaiser commented Sep 25, 2013

What do you suggest?

@eschnett
Copy link
Contributor Author

If HPX is supposed to be able to run multiple times on a node, then this is an error that needs to be corrected.

If HPX has not been designed for this, then I suggest to make this possible. For example, if HPX wants to use a particular port range, then it could check at startup whether this range is available, and if not, choose a different range.

Since this uses the MPI parcelport, I would expect HPX to use MPI for communication, which should have this problem already solved.

@hkaiser
Copy link
Member

hkaiser commented Sep 25, 2013

You're right that this shouldn't happen when the MPI parcel port is used. I will fix this. (EDIT: Thomas' suggestion below solves this issue).

I'm not sure if just binding to an arbitrary port from a given range is a good idea. It might be appropriate to do when the application is run on a single locality, but it will not be possible to implement in the general case for distributed applications. Those assume using a given port (on the console) in order to successfully connect worker localities.

As a workaround you can always specify the used ports, e.g.

app_on_locality_0 -a node_0:7912 -x node_0:7912
app_on_locality_1 -a node_0:7912 -x node_1:7913

@sithhell
Copy link
Member

Even though the MPI parcelport is used. The tcp parcelport is started as well, which leads to this error. In order to avoid that problem completely, you can disable the TCP parcelport via command line by using -Ihpx.parcel.tcp.enable=0.

@gbibek
Copy link
Member

gbibek commented Sep 25, 2013

kill_process_tree in python.py might be the problem. Let me look in this.

On Wed, Sep 25, 2013 at 11:21 AM, Hartmut Kaiser
notifications@github.comwrote:

What do you suggest?


Reply to this email directly or view it on GitHubhttps://github.com//issues/879#issuecomment-25102088
.

@sithhell
Copy link
Member

Closed prematurely. We need to add an option to disable the TCP parcelport at compile time.

@sithhell sithhell reopened this Sep 25, 2013
@eschnett
Copy link
Contributor Author

Is there a way to disable the TCP parcelport for the self tests? (Disabling it a build time would also work.)

@sithhell
Copy link
Member

Am 25.09.2013 19:39 schrieb "Erik Schnetter" notifications@github.com:

Is there a way to disable the TCP parcelport for the self tests?
(Disabling it a build time would also work.)

Unfortunately not... Right now the self tests are hardcoded to be used with
the TCP parcelport only.


Reply to this email directly or view it on GitHub.

@hkaiser
Copy link
Member

hkaiser commented Sep 25, 2013

Unfortunately not... Right now the self tests are hardcoded to be used with
the TCP parcelport only.

That's not entirely correct. The tests use whatever is configured as the default. A simple

export HPX_HAVE_PARCELPORT_TCPIP=0 ; make tests

does the trick. Moreover, there is a way to pass arbitrary arguments from the make tests command line down to all tests, I just forgot how this works :-P (seems we need more documentation on this).

@hkaiser
Copy link
Member

hkaiser commented Sep 25, 2013

Ok, I figured it out. This feature has been accidentally removed when we switched to CTest. I reopened #843: Tests should use CTest and added a comment there.

@eschnett
Copy link
Contributor Author

When I use this to run the tests, they fail with

hpx::init: std::exception caught: unsupported connection type 'connection_tcpip': HPX(bad_parameter)

@hkaiser
Copy link
Member

hkaiser commented Sep 25, 2013

Yah, Thomas just reminded me that you still need to run the tests with mpirun which is not automatically done when you follow my suggestion. My bad.

@brycelelbach
Copy link
Member

BTW - the cleanup code in hpx_run_tests.py has recently been made more robust. Previously it only killed direct children of the parent process.

@ghost ghost assigned brycelelbach Oct 5, 2013
@brycelelbach
Copy link
Member

Fixed, you can now pass arguments via make HPX_TEST_ARGUMENTS=...

@sithhell
Copy link
Member

What about the MPI psrcelport? How is it properly started?

@sithhell sithhell reopened this Oct 10, 2013
@hkaiser
Copy link
Member

hkaiser commented Oct 10, 2013

Wouldn't

make HPX_TEST_ARGMENTS=-Ihpx.parcelport.boostrap=mpi tests

do the trick already?

@sithhell
Copy link
Member

Am 10.10.2013 13:13 schrieb "Hartmut Kaiser" notifications@github.com:

Wouldn't

make HPX_TEST_ARGMENTS=-Ihpx.parcelport.boostrap=mpi tests

do the trick already?

No. You still need to call the tests with mpirun.


Reply to this email directly or view it on GitHub.

@brycelelbach
Copy link
Member

I thought that HPX could launch the MPI parcelport without mpirun? Isn't that why we needed HPX_TEST_ARGUMENTS?

@brycelelbach
Copy link
Member

You can now do the following:

make HPX_TEST_LAUNCHER=mpirun HPX_TEST_ARGUMENTS=-Ihpx.parcelport.bootstrap=mpi tests

The invocation of the tests is now formatted like this:

<launcher> <test name>_<test suffix> <automatically generated arguments> <user specified arguments>

@brycelelbach brycelelbach reopened this Oct 20, 2013
@sithhell
Copy link
Member

What's left to do here?

@hkaiser hkaiser closed this as completed Mar 25, 2014
@hkaiser hkaiser reopened this Mar 25, 2014
@hkaiser hkaiser modified the milestones: 0.9.9, 0.9.8 Mar 25, 2014
@sithhell
Copy link
Member

This issue has been resolved. The testsuite is testing all parcelports and failed tests are cleaned by ctest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants