Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using runtime_mode_connect, find the correct localhost public ip… #1654

Merged
merged 1 commit into from Jul 8, 2015

Conversation

biddisco
Copy link
Contributor

@biddisco biddisco commented Jul 6, 2015

… address

If using --hpx::connect, the worker needs to know the public ip address
of itself so that it can send it to the hpx:agas root where it is used
for communication. If the user does not specify the hpx parcel address
then 127.0.0.1 is assumed, but this only works for localities ilaunched
on the same node.

This patch finds the public ip address and uses that instead.

It may still fail if multiple ip addresses exist for the node,
but it is a better guess than before - providing those ip addresses
support tcp connections then all should be ok.

This is particularly useful when the worker is spawned using

srun app <params> --hpx:hpx=xx.xx.xx.xx:port

where the user does not know which node srun will spawn the worker on
so cannot easily put the correct ip address on the command line.

@hkaiser
Copy link
Member

hkaiser commented Jul 6, 2015

This functionality is already available from:

https://github.com/STEllAR-GROUP/hpx/blob/master/hpx/util/asio_util.hpp#L26
https://github.com/STEllAR-GROUP/hpx/blob/master/src/util/asio_util.cpp#L51

You might want to consider calling resolve_hostname() directly to avoid duplicating this code.

@biddisco
Copy link
Contributor Author

biddisco commented Jul 6, 2015

I believe that I tried that code, but if you resolve the hostname "localhost", it returns "127.0.0.1", which is then used by the remote locality to send stuff back (AFAICT the ip is exchanged with the remote node which tries to send data back to 127.0.0.1 and fails). The patch correctly gets the public ip address of the local node which is then used by the other nodes and works. TBH I'm not 100% certain of what is going wrong, but this patch does get the right ip address and enables me to use srun to launch nodes and connect back without knowing the ip address of the nodes I'm launching on beforehand.

@hkaiser
Copy link
Member

hkaiser commented Jul 6, 2015

Nod, I can see how that's happening. Could you then factor out your functionality into a separately compiled function (add it to the asio_util pair of files) to avoid having the direct asio code intermixed with the command line handling?

@biddisco
Copy link
Contributor Author

biddisco commented Jul 6, 2015

Will do. Also I'll play with the existing code in util just in case I messed up. I can see that in my code I query "" and not "localhost", it might be that if I try that, it will get the ip I want.

… address

If using --hpx::connect, the worker needs to know the public ip address
of itself so that it can send it to the hpx:agas root where it is used
for communication. If the user does not specify the hpx parcel address
then 127.0.0.1 is assumed, but this only works for localities ilaunched
on the same node.

This patch finds the public ip address and uses that instead.

It may still fail if multiple ip addresses exist for the node,
but it is a better guess than before - providing those ip addresses
support tcp connections then all should be ok.

This is particularly useful when the worker is spawned using
    srun app <params> --hpx:hpx=xx.xx.xx.xx:port
where the user does not know which node srun will spawn the worker on
so cannot easily put the correct ip address on the command line.
@biddisco
Copy link
Contributor Author

biddisco commented Jul 6, 2015

I put the code into hpx::util

@hkaiser
Copy link
Member

hkaiser commented Jul 6, 2015

LGTM, thanks!

hkaiser added a commit that referenced this pull request Jul 8, 2015
When using runtime_mode_connect, find the correct localhost public ip…
@hkaiser hkaiser merged commit 527054c into master Jul 8, 2015
@hkaiser hkaiser deleted the fixing_1632 branch July 8, 2015 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants