Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--bind-to check prioritized. Use iface address instead of loopback if available. #6030

Merged
merged 1 commit into from
Apr 17, 2014
Merged

--bind-to check prioritized. Use iface address instead of loopback if available. #6030

merged 1 commit into from
Apr 17, 2014

Conversation

amitmurthy
Copy link
Contributor

This PR does the following:

  • --bind-to command line option is used to solve the multi-homed node issue. It is is processed before other arguments and LPROC.bind_addr is set the ip-address suggested on the command line (if present), else it is set to whatever is returned by getipaddr().
  • detects which of the processes are on the localhost and only uses the loopback address to connect to them - this is resilient across system sleep/wake cycles.
  • The machine line in --machinefile file format (as also addprocs(machines::Vector)), is now of the form [user@]host[:port] [bind_addr]. i.e., it supports an optional bind_addr field that is passed as a command line argument to the started workers.

Closes #5995

@jiahao
Copy link
Member

jiahao commented Mar 3, 2014

On this branch the addprocs command stalls for me:

julia> addprocs(["127.0.0.1", "127.0.0.1"])

ssh: connect to host 127.0.0.1 port 22: Connection refused
ssh: connect to host 127.0.0.1 port 22: Connection refused


^C
Program received signal SIGINT, Interrupt.
0x000000010012d4c9 in add_page (p=0x100cfc200) at gc.c:460
460         v = (gcval_t*)((char*)v + p->osize);

@amitmurthy
Copy link
Contributor Author

you need to be running the ssh daemon on localhost since you are using the ssh mode of addprocs to launch workers on localhost.

@amitmurthy
Copy link
Contributor Author

gdb also does not print a stack

julia> addprocs(["127.0.0.1", "127.0.0.1"])
[New Thread 0x7fffeb137700 (LWP 6159)]
...
[New Thread 0x7fffe9934700 (LWP 6162)]
2-element Array{Any,1}:
 2
 3

julia> exit(1)
[Thread 0x7fffe9934700 (LWP 6162) exited]
...
[Thread 0x7fffdce3c700 (LWP 6054) exited]
[Inferior 1 (process 6045) exited with code 01]
(gdb) Segmentation fault

(gdb) where
No stack.
(gdb) 

@jiahao
Copy link
Member

jiahao commented Mar 3, 2014

Oh right, duh.

# Currently disabled since this caused processes to spin instead of
# exit when process 1 shut down. Don't yet know why.
#redirect_stderr(STDOUT)
redirect_stderr(STDOUT)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uncommented the above since it does not seem to be a problem now.

@amitmurthy
Copy link
Contributor Author

@JeffBezanson - please have a look.

@amitmurthy
Copy link
Contributor Author

Updated to ensure that check_same_host does not use ip-addresses for checking localhost workers. Addresses issues caused by a system sleep/awaken cycle.

@bjarthur
Copy link
Contributor

bjarthur commented Apr 8, 2014

this PR fixes my problems doing an addprocs() followed by an addprocs_sge() on a cluster whose nodes have two NICs. thanks amit. i tried merging into this branch the latest commits from master, and got a couple conflicts...

@amitmurthy
Copy link
Contributor Author

Good. I'll rebase once @JeffBezanson takes a look and is OK with this approach.

@JeffBezanson
Copy link
Member

Looks good.

amitmurthy added a commit that referenced this pull request Apr 17, 2014
--bind-to check prioritized. Use iface address instead of loopback if available.
@amitmurthy amitmurthy merged commit c472f7e into JuliaLang:master Apr 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store external ip in case of localhost addprocs
5 participants