Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two HPX applications can't run at the same time. #3453

Closed
khuck opened this issue Sep 11, 2018 · 5 comments

Comments

Projects
None yet
4 participants
@khuck
Copy link
Contributor

commented Sep 11, 2018

Expected Behavior

If one HPX application is running, another HPX application should be able to run.

Actual Behavior

The second application will fail with this exception:

terminate called after throwing an instance of 'hpx::detail::exception_with_info<hpx::exception>'
  what():  <unknown>: HPX(network_error)

This is particularly problematic on the Phylanx buildbot server, different configurations can't be built and tested concurrently because one of them will fail.

Steps to Reproduce the Problem

Run two HPX applications on the same host concurrently.

Specifications

  • HPX Version: current master
  • Platform (compiler, OS): GCC 7.1 on Red Hat Linux
@msimberg

This comment has been minimized.

Copy link
Contributor

commented Oct 8, 2018

I don't know if this was already discussed offline, but I see a few options:

  • In the case that there are multiple multi-locality HPX applications running this is probably unfixable, except for giving a better error message.
  • If there's only one locality per application use HPX_WITH_NETWORKING=OFF. This should allow you to run multiple applications simultaneously and if it doesn't it's a bug.
  • If there's only one locality per application we could try to skip the networking related parts causing this error (binding ports?) in the vein of HPX_WITH_NETWORKING=OFF except at runtime, or
  • One has to specify different ports for each application, but this is tedious to do manually.
@biddisco

This comment has been minimized.

Copy link
Contributor

commented Oct 8, 2018

The way most libraries do it is to randomize port numbers if there is a conflict. libfabric for example does this, instead of specifying the port number you want, you just ask for a free one and it tells you what to use.

@hkaiser

This comment has been minimized.

Copy link
Member

commented Oct 8, 2018

@biddisco randomizing the port numbers might not work as all localities connected to the same application instance need to share it.

@hkaiser

This comment has been minimized.

Copy link
Member

commented Oct 8, 2018

@msimberg we could disable the initialization of the tcp parcelports if we know that an application is running on one locality and we don't expect any more localities to connect dynamically (which is almost always true during testing). We currently have a command line option that disables dynamic connections to a running HPX application. We might want to inverse this and add one that enables dynamic connections. That would allow to disable networking for all single locality runs,

@msimberg

This comment has been minimized.

Copy link
Contributor

commented Oct 8, 2018

@hkaiser that sounds pretty reasonable. I didn't even realize that connecting dynamically was an option but now that you say it it makes sense. I'll try to have a look at this eventually but can't promise anything for 1.2.0. Is this an urgent issue for phylanx?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.