Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make julia -p N use fork instead of exec #985

Closed
StefanKarpinski opened this issue Jun 27, 2012 · 12 comments
Closed

make julia -p N use fork instead of exec #985

StefanKarpinski opened this issue Jun 27, 2012 · 12 comments
Assignees
Labels
decision A decision on this change is needed performance Must go faster won't change Indicates that work won't continue on an issue or pull request

Comments

@StefanKarpinski
Copy link
Sponsor Member

The tricky bit here is making sure each process knows that it isn't process 1.

@ghost ghost assigned StefanKarpinski Jun 27, 2012
@JeffBezanson
Copy link
Sponsor Member

Summary of state that needs to be reinitialized:

  • empty _jl_fd_handlers
  • empty Workqueue
  • empty Waiting
  • set PGRP = ProcessGroup(0, {}, {})
  • empty ioq in sys.c. But I don't know what to do if the IO thread is in the middle of sending something.

Then we need to run start_worker, except the part that sets Scheduler, because it is a constant. Maybe it will have to become a non-constant, because the current scheduler may not be in a consistent state and needs to be replaced.

This could use the feature in process.jl that allows spawning a julia function like a process, since that does fork without exec.

It might be annoying that processes started this way share some state (like which files have been loaded), while remote processes will not. This means something tested with local processes might not work when distributed.

So in summary I'm not fully convinced this is a good idea.

@Keno
Copy link
Member

Keno commented Jun 28, 2012

Plus, it's not possible to do cleanly on Windows (though of course it would be possible to have different approaches for Unix/Windows)

@StefanKarpinski
Copy link
Sponsor Member Author

Ok, I'm going to mark this as "won't fix" and close it then.

@wavexx
Copy link
Contributor

wavexx commented Jul 14, 2014

I'd like to resume this issue, since I think fork() is an essential tool for performance tuning on unix.

Since you already closed this issue, I'd like you to change your mind with a more elaborated answer:

  • Julia would definitely benefit of platform-dependent modules. Such as Os.Unix Os.Windows, etc, that expose basic os-dependent functionality as fork, dup, and the like. Note that this code is not meant for users, it's meant for the core Julia library.
  • You don't need to expose fork at all. In fact, you can bury it inside the ClusterManager. If you're adding a worker on localhost using addhosts, you can optimize the channel by creating a dup-ed descriptor, fork, and pretty much everything else will work unchanged. But of course, I don't think I need to explain the advantage of using fork here.

This relates to my issue #7589, as it's a pretty obvious optimization that should be invisible to the user if implemented properly.

@Aerlinger
Copy link
Contributor

+1. I strongly agree.
On Jul 14, 2014 5:30 AM, "wavexx" notifications@github.com wrote:

I'd like to resume this issue, since I think fork() is an essential tool
for performance tuning on unix.

Since you already closed this issue, I'd like you to change your mind with
a more elaborated answer:

Julia would definitely benefit of platform-dependent modules. Such as
Os.Unix Os.Windows, etc, that expose basic os-dependent functionality as
fork, dup, and the like. Note that this code is not meant for users,
it's meant for the core Julia library.

You don't need to expose fork at all. In fact, you can bury it
inside the ClusterManager. If you're adding a worker on localhost using
addhosts, you can optimize the channel by creating a dup-ed descriptor,
fork, and pretty much everything else will work unchanged. But of
course, I don't think I need to explain the advantage of using fork here.

This relates to my issue #7589
#7589, as it's a pretty
obvious optimization that should be invisible to the user if implemented
properly.


Reply to this email directly or view it on GitHub
#985 (comment).

@ViralBShah
Copy link
Member

We are soon going to have multi threading support. For many cases that will end up being a better solution, but fork based solutions will still be needed.

@StefanKarpinski
Copy link
Sponsor Member Author

It's unclear whether it will be possible/practical to have identical semantics for multiprocess vs. multithreaded setups, so one may want to use multiple processes even if threads could be more efficient just so that the semantics are the same as they would be for a fully distributed computation.

@tkelman
Copy link
Contributor

tkelman commented Jul 14, 2014

It's unclear whether it will be possible/practical to have identical semantics for multiprocess vs. multithreaded setups

This. I'd argue they should not be. We obviously want both to be as easy to use as possible, but they're such different models that we shouldn't try to make them indistinguishable from one another. I don't think Julia's parallel performance is up to snuff with MPI for fine-grained communication (but I should really get on that NERSC paperwork to find out for sure), but all of today's serious HPC clusters are running hybrid distributed/shared setups with MPI between nodes and OpenMP within the cores on each node (and an increasing number of them throw in Cuda/OpenCL/Xeon Phi accelerators into that mix too). Exposing independent hierarchical control of threads and processes to expert users will be vital to edge Julia's capabilities further in that direction.

@StefanKarpinski
Copy link
Sponsor Member Author

Yeah, that's where we're leaning towards. But that means you might still want to use the multiprocess model sometimes.

@wavexx
Copy link
Contributor

wavexx commented Jul 14, 2014

I think we're in the same line of thought here. My specific usage scenario of fork() as a hidden optimization is ClusterManager was only a sensible motivator for not discarding fork as "non portable".

It's obvious that fork would allow more sophisticated forms of IPC as well (different processes but with shared memory segments, descriptor passing, etc), and combinations of all forms of parallelism can be used at the same time in complex projects (ie, MPI allows multi-process + multi-thread models to co-exist and communicate with varying degree of messaging cost).

Depends entirely on what the programmer wants to do and what models the underlying data/algorithm supports.

@ihnorton
Copy link
Member

Re windows: I came across some discussion of NtCreateProcess / ZwCreateProcess last night and wanted to mention it because it hasn't come up before on the list. Seems to be undocumented, and cygwin chose not to use it. However, SciLab does use it for their windows fork implementation, fwiw...

@wavexx
Copy link
Contributor

wavexx commented Sep 8, 2014

I'll throw in some more considerations about using fork() here.

I was recently trying to use vfork+exec directly from julia.
It's obvious that there might be interactions with libuv.

Consider the following pseudocode:

1: pid = ccall(:vfork)
2: if pid == 0:
3: if ccall(:exec) == -1: ccall(:_exit)

Is it currently safe to assume that nothing is polling descriptors/queues
between line 1-3? fork will implicitly only clone the current thread and clear
timers in the child process, but file descriptors are still open. I'm not
familiar enough with libuv to know if this will ensure that the child process
will not drain the descriptors and/or attempt to restart polling on the
child process.

Do we need a macro to disable libuv within a block, and/or need a macro to disable tasks entirely?

There is also some state to be reset when forking. For instance, the list of
managed processes and , which should be cleared. Similarly, any list of threads (for
when julia will support threading) might need to be cleared.

Until then, there's not much that can be done in the forked process. For
instance, even calling exit() is unsafe, so _exit might be the only available
option right now. Am I correct?

@wavexx wavexx mentioned this issue Sep 10, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decision A decision on this change is needed performance Must go faster won't change Indicates that work won't continue on an issue or pull request
Projects
None yet
Development

No branches or pull requests

8 participants